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ABSTRACT 


The  multi-backend  database  system(MBDS)  is  a  research  effort  conducted 
jointly  by  the  Naval  Postgraduate  School  and  Ohio  State  University  with  the 
sponsorship  of  the  STARS  foundation.  The  MBDS  is  designed  to  overcome  the 
capacity  growth  and  performance  gain  problems  of  the  traditional  database  sys¬ 
tems  and  the  single-backend  database  systems. 

To  verify  the  aforementioned  capacity  growth  and  performance  claims,  a 
benchmarking  methodolgy  has  been  formulated  in  the  Naval  Postgraduate  Thesis. 
"A  Performance  Measurement  Methodology  for  Software  Multiple-Backend  Data¬ 
base  Systems"  by  James  R.  Vincent.  This  thesis  presents  the  implementation  of 
the  methodology  in  the  form  of  a  computer-aided  design  (CAD)  system  for  the 
generation  of  test  databases  and  test-transaction  mixes  for  benchmarking  MBDS. 
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I.  INTRODUCTION 


A.  THE  BACKGROUND 

Information  processing  has  a  special  significance  in  today's  real  world 
applications.  Government,  private,  and  commercial  organizations  are  increasingly 
dependent  on  timely,  accurate  information.  Within  the  computer  science 
community,  several  areas  of  research  have  sprouted  new.  innovative  approaches  to 
satisfy'  the  information  needs  of  these  organizations.  Today,  numerous  variations 
of  database  systems  have  flooded  the  market.  Database  systems  consist  of  both 
hardware  components  and  software  packages  designed  for  fast,  accurate,  efficient, 
and  economical  information  processing.  These  systems  tire  better  known  as 
database  management  systems  (DBMS). 

1.  Three  Database- System  Approaches 

Three  database- system  approaches  have  emerged:  1)  the  traditional 
mainframe-based  system.  2)  the  software  single-backend  system,  and  3)  the 
software  multiple-backend  system  [Ref.  l:pp.  3-5j.  The  software  multiple-backend 
system  is  designed  to  overcome  the  upgrade  and  performance  problems 
experienced  with  both  the  traditional  and  the  single-backend  systems.  This 
system  is  more  unconventional  than  the  first  two  and  is  a  new  kind  of  database 


computer.  The  software  multiple-backend  system  is  new  because  it  resembles 


neither  the  the  traditional  approach  to  database  management  by  placing  the 
system  software  in  a  mainframe  computer  such  as  SQL/DS  in  an  IBM  3033.  nor 
the  more  recent  approach  to  database  management  by  utilizing  the  dedicated 
hardware 


Mainframe 


Figure  1.  The  Traditional  Approach  to  Database  Management 


and  software  in  a  single-backend  computer  such  as  the  Britton- Lee  IDM  500. 

In  the  mainframe-based  approach,  a  database  system  is  characterized 
as  an  applications  program  (albeit,  a  major  one)  which  shares  the  resources  of  the 
mainframe  computer  with  other  applications  programs  (depicted  in  Figure  1).  In 
the  single-backend  approach  a  database  system  has  the  exclusive  control  and 
use  of  the  resources  of  the  entire  backend  computer  (depicted  in  Figure  2).  The 
term  backend  connotes  the  'back'  of  terminals  or  general-purpose  computers, 
where  neither  the  terminals  nor  the  general-purpose  computers  (termed  the 
hosts)  provide  the  database  management  services.  Instead,  the  database 


Ho«( 


Backend 


Figure  2.  The  Software  Single-Backend  Approach  to  Database  Management 

management  services  are  provided  by  the  backend  computer  to  the  user  or  user 
programs  (transactions)  via  the  host. 

2.  Software  Multiple-Backend  Database  Systems 

The  new  kind  of  database  computers  (depicted  in  Figure  3)  is  of  the 
multiple-backend  approach  where  the  database  system  is  not  mainframe- 
based  and  each  database  system  consists  of  at  least  one  controller  and  two  or 
more  bac-  mds  with  their  disk  systems  interconnected  by  a  network.  The 
controller  software  is  mainly  dedicated  to  1)  the  communication  with  the  hosts 
and  the  terminals,  2)  the  scheduling  and  control  of  transaction  executions  by  the 
backends,  and  3)  the  routing  of  the  responses  coming  from  the  backends  back  to 
the  user.  The  backend  software  in  each  of  the  multiple  backends  is  identical  and 
is  responsible  for  carrying  out  the  primary  database  operations  such  as  the 
retrieve,  insert,  delete,  and  update  for  the  transactions.  Examples  of  the 

multiple-backend  approach  to  database  management  are  the  commercial  Teradata 
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DBC/1012  system,  and  the  Naval  Postgraduate  School's  experimental  multi¬ 
backend  database  system  (MDBS)  [Ref.  l:pp.  5-6]. 

Unlike  the  mainframe-based  and  single-backend  approaches,  the 
multiple-backend  approach  emphasizes  great-capacity  and  high-performance 
database  management.  Furthermore,  it  attempts  to  relate  the  capacity  growth 
and  performance  gains  to  the  number  of  backends  in  the  system.  In  other  words, 
when  new  backends  and  their  disk  systems  are  added  to  a  multiple-backend 


Figure  3.  The  Software  Multiple-Backend  Approach  to  Database  Management 


database  computer,  an  increase  in  both  the  capacity  and  the  performance  would 


likely  be  produced.  The  MDBS  system  is  expandable  in  terms  of  tens  of  backends 
and  associated  disk  systems,  and  the  DBC/1012  is  expandable  in  terms  of 
hundreds  of  backends  and  disks.  Certainly,  in  the  former  system,  the  capacity 
growth  and  performance  can  be  measured  in  tens  and  in  the  latter  in  hundreds. 

B.  PERFORMANCE  AND  CAPACITY-GROWTH  CLAIMS 

To  overcome  the  performance  problems  and  upgrade  issues  of  the  traditional 
mainframe-based  approach  and  of  the  conventional  software  single-backend 
approach,  the  software  multiple-backend  approach  provides  performance  gains 
through  specialization  of  the  database  operations  on  dedicated,  multiple  backends. 
The  system  utilizes  multiple  backends  connected  in  a  parallel  fashion  in  order  to 
achieve  performance  gains  and  capacity  growth.  The  backend  controller  is 
responsible  for  supervising  the  execution  of  database  transactions  and  for  the 
interfacing  with  the  hosts  and  users.  The  backends  perform  the  database 
operations  with  the  database  stored  and  evenly  distributed  across  the  disk  systems 
of  the  backends.  The  controller  and  backends  are  connected  by  a  communications 
bus.  Users  access  the  system  either  by  way  of  the  hosts  or  through  the  controller 
directly. 

The  two  goals  of  the  software  multi-backend  database  system  are  of  course  to 
overcome  the  performance  problems  and  upgrade  issues  of  the  traditional 
mainframe-based  or  the  conventional  software  single-backend  database  systems. 


First,  by  increasing  the  number  of  backends,  while  the  size  of  the  database  and 


the  size  of  the  responses  to  the  transactions  remain  constant,  the  database  system 


is  to  produce  a  reciprocal  decrease  in  the  response  times  of  the  user  transactions. 
Second,  by  increasing  the  number  of  backends  proportionally  to  the  increase  of 
transaction  responses,  the  database  system  is  to  produce  invariant  response  times 
for  the  user  transactions. 

The  first  goal  allows  the  multiplicity  of  the  backends  of  the  database  system 
to  be  directly  related  to  the  performance  gains  of  the  database  system  in  terms 
of  the  response-time  reduction.  Response-time  reduction  means  the  amount  of 
reduction  in  the  response  time  of  a  request,  when  the  request  is  processed  in  a 
system  with  several  backends  as  opposed  to  processing  the  same  transaction  in  a 
one  backend  system,  while  using  the  same  test-database  set[Ref.  l:p.  24].  The 
second  goal  enables  the  multiplicity  of  the  backends  of  the  system  to  be  directly 
related  to  the  capacity  growth  of  the  system  in  terms  of  response-time 
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invariance.  Response-time  invariance  means  the  amount  of  change  in  the 
response  time  of  a  request,  when  the  request  is  processed  in  a  one  backend  system 
with  a  response  set  of  x  records,  as  opposed  to  processing  the  same  transaction 
in  a  system  with  m  backends  with  a  response  set  of  mx  records  [Ref.  l:p.  24]. 
Since  the  size  of  the  response  set  for  a  request  is  determined  by  the  size  of  the 
database  (i.e.,  large  databases  generate  more  responses  for  the  same  request),  the 
definition  of  response-time  invariance  can  be  restated  as  the  amount  of  change  in 

the  response  time  of  a  request,  when  the  request  is  processed  in  a  one  backend 
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system  with  a  database  size  of  x  records,  as  opposed  to  processing  the  same 


transaction  in  a  system  with  m  backends  with  a  database  size  of  mx  records. 


(Response  set  means  the  set  of  responses  returned  by  the  backends(s)  to  the  user 
as  a  result  of  processing  a  transaction.) 


C.  THE  BENCHMARK  OVERVIEW 

To  verify  the  aforementioned  performance  and  growth-capacity  claims. 
Vincent  has  formulated  a  benchmarking  methodology'  for  software  multiple- 
backend  database  systems.  Vincent’s  work  [Ref.  2]  provides  a  logical  continuation 
of  two  prior  projects  aimed  at  developing  a  comprehensive  performance 
measurement  methodology.  Kovalchik  [Ref.  3]  has  developed  a  set  of  performance 
measurement  tools  for  conducting  system  testing.  Tekampe  and  Watson  [Ref.  4] 
have  developed  a  performance  measurement  methodology  for  database  systems  to 
provide  boih  external  and  internal  performance  measurements  by  embedding 
timing  checkpoints  in  the  MBDS  software  to  provide  the  required  measurements. 
The  information  provided  by'  the  external  checkpoints  provides  a  measure  of  the 
response  time. 

Vincent's  methodology'  articulates  numerous  parameters  to  insure  database- 
independence  and  application-independence  when  generating  the  benchmarking 
information.  Thus,  the  design  of  the  benchmarking  methodology  is  complicated. 
There  is  the  need: 


1)  to  have  test  databases  which  can  be  used  for  testing  backends  of  varying 
numbers,  for  deriving  partitions  (clusters)  of  a  database,  and  for  placing 
the  partitions  on  parallel  stores: 

2)  to  have  test-transaction  mixes  which  can  be  used  for  measuring  primary 
database  operations  in  terms  of  their  response  times,  for  verifying  the 
response-time  reductions  due  to  additions  of  backends  and  the 
redistribution  of  the  same  database,  for  clarifying  the  response-time 
invariance  on  account  of  various  growths  in  database  capacity  with 
corresponding  additions  of  backends  and  backend  stores; 

3)  to  have  systematic  ways  to  generate  the  test  databases  and  the  test- 
transaction  mixes,  to  conduct  the  tests,  to  collect  the  test  results,  to 
interpret  the  results  and  to  verify  the  results  against  established  measures 
(benchmarks). 

The  major  portions  of  the  design  of  the  benchmarking  methodology  consist  of  the 
articulation  and  establishment  of  the  measurement  criteria  and  measures,  the 
design,  interpretation,  and  generation  of  the  test  databases,  the  test-transaction 
mixes,  the  test  procedures,  and  the  test  configurations. 

The  focus  of  this  thesis  then  is  the  design  and  implementation  of  a 
computer-aided  design  (CAD)  system  for  the  generation  of  test  transactions  and 
test  databases  that  may  be  used  for  the  benchmarking  of  parallel,  multiple- 
backend  database  systems.  One  of  the  most  salient  features  of  the  CAD  system  is 
to  reduce  the  amount  of  user  input.  The  user  needs  only  to  input  three  essential 
elements  of  information: 

•  the  number  of  backends  in  the  system  to  be  tested. 

•  the  amount  of  disk  storage  per  backend,  and 

•  the  size  of  the  data  transfer  from  the  secondary  storage  (disk)  to  the 
primary  storage  (main  memory). 


Once  the  user  has  provided  these  values,  the  CAD  system  automatically  generates 
the  test  databases  and  the  test  transaction  mixes.  Furthermore,  it  yields  a 
detailed  report  that  guides  the  database  evaluator  through  the  testing  process. 
The  CAD  system  is  also  very  generic  in  form,  and  is  able  to  generate  test 
database  and  transaction  mixes  for  any  combination  of  the  input  test  variables. 

The  CAD  system  described  in  this  thesis  generates  the  test  databases,  the  test 
transaction  mixes,  and  an  evaluator's  guide  for  benchmarking  parallel,  multiple 
backend  database  systems.  This  CAD  system  is  a  first  version;  the  second  version 
will  be  integrated  with  MBDS,  allowing  the  testing  process  to  be  controlled  and 
managed  by  the  CAD  system.  The  third  version  will  add  components  to  collect 
statistics  (e.g.,  response  times)  for  the  different  tests  and  calculate  statistics  (i.e.. 
mean  and  standard  deviation  of  tests,  response-time  reductions  and  response-time 
invariances)  that  measure  the  performance  of  MBDS. 

D.  ORGANIZATION  OF  THE  THESIS 

The  thesis  is  organized  into  three  chapters  in  addition  to  this  introduction. 
Chapter  II  provides  a  general  overview  of  the  CAD  system  structure.  Chapter  III 
presents  the  detailed  design  features  and  focuses  on  the  major  components  of  the 
CAD  system:  the  generated  test  databases  and  transaction  mixes,  and  the 
framework  for  the  detailed  user's  guide  (evaluator’s  report).  Chapter  IV  presents 
a  summary  and  the  conclusions  of  the  thesis,  and  offers  future  work  in 


II.  GENERAL  OVERVIEW 


Chapter  II  presents  a  three-part  overview  of  the  general  structure  of  the  CAD 
system  for  generating  test-databases  and  test-transaction  mixes.  Part  one 
describes  the  essence  of  Vincent’s  methodology  [Ref.  2).  Part  two  briefly  describes 
the  prototype  multi-backend  database  system  under  development  at  the 
Laboratory  for  Database  Systems  Research,  Naval  Postgraduate  School. 
Monterey.  California.  Finally,  part  three  introduces  the  framework  of  the  CAD 
system. 

A.  TEST-DATABASE  AND  TEST-TRANSACTION  DESIGN  FACTORS 

The  database  design  factors  presented  in  Vincent's  thesis  [Ref.  2:pp.  29-48] 
are  the  backbone  not  only  of  his  methodology,  but  of  the  CAD  system  as  well. 
The  generated  test-database  sets  and  the  generated  test-transaction  mixes  are  the 
two  major  components  of  his  performance  measurement.  Essentially.  Vincent's 
methodology  describes  how  to  create  these  test-database  sets  and  test-transaction 
mixes.  The  creation  of  the  database  sets  and  the  transaction  mixes  is  driven  by 
three  critical  elements  of  information  solicited  from  the  user. 

•  the  number  of  backends  in  the  system  to  be  tested. 

•  the  amount  of  disk  storage  per  backend,  and 

•  the  size  of  the  data  transfer  from  the  secondary  storage  to  the  primary 
storage. 


The  database  design  factors  are  the  svstem  configuration  considerations,  the 


For  a  given  test  database,  two  sets  of  configurations  must  be  generated,  a 


set  for  the  measurement  of  the  response-time  reduction,  and  a  set  for  the 
measurement  of  the  response-time  invariance.  The  number  of  configurations 
within  each  set  is  determined  by  the  number  of  backends  of  the  system  to  be 
tested.  A  configuration  may  range  from  1  backend  to  M  backends.  Consider  a 
system  with  M  backends  and  N  number  of  bytes  in  the  database.  The  database 
of  N  bytes  must  be  evenly  distributed  over  the  backends.  Depending  on  the 

configuration  being  used,  the  database  must  be  evenly  distributed  to  1,  2.  3 . 

or  M  backends.  To  determine  a  database  size  which  permits  an  equal  distribution 
of  data  to  each  backend  in  the  system,  the  least  common  multiple  (  1cm)  must 

be  calculated  for  the  possible  system  configurations  of  1.  2,  3 . or  M  backends. 

The  1cm  is  used  in  another  calculation,  the  database-size  multiple,  or  dbm. 
The  dbm  is  the  ultimate  factor  used  to  determine  the  correct  database  size  which 
allows  for  equal  distribution  of  the  database  across  M  backends.  The  dbm  is 
discussed  later. 

The  total  number  of  configurations  for  a  given  test  database  is 


t  V.--, 
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determined  by  the  equation: 


Number  of  configurations  =  M  +  (M  -  1)  =  2M  -  1 
where  M  is  the  total  number  of  backends  to  be  tested  in  the  svstem. 


To  verify  the  performance-gain  claim,  the  database  must  be  distributed 

evenly  across  1.  2.  3 . M  backends,  with  each  distribution  comprising  a 

configuration.  For  example,  given  a  system  with  two  backends,  two  configurations 
are  need  to  test  the  response-time  reduction  claim.  The  first  configuration  has  the 
entire  database  on  one  backend,  and  the  second  configuration  has  the  database 
split  evenly  between  the  two  backends.  Thus  for  the  response-time  reduction 
calculations  of  a  M-backend  system,  M  configurations  are  needed. 

To  verify  the  growth-capacity  claim,  the  database  size  must  increase  by  a 

factor  of  2.  3 . M  .  with  each  configuration  reflecting  the  incremental  increase 

in  size  up  to  M.  Again,  given  a  system  with  two  backends,  two  configurations  are 
needed  to  test  the  response-time  invariance  claim.  The  first  configuration  has  the 
entire  database  on  one  backend.  The  second  configuration  then  doubles  the  size 
of  the  database  and  distributes  it  evenly  over  two  backends.  Thus  for  the 
response-time  invariance  calculations  of  a  M-backend  system.  M  configurations 
are  also  needed.  Tables  1  and  2  reflect  the  database  allocation  with  respect  to  the 
number  of  configurations.  Notice  however,  that  the  first  configuration  for  the 
response-time  reduction  set  and  the  first  configuration  for  the  response-time 
invariance  set  are  the  same  configuration.  Eliminating  the  creation  of  the 
duplicate  configuration,  the  total  number  of  test  configurations  is  equal  to  (M  + 
M  -1)  or  2M  -  1.  Tables  3-5  summarize  this  information  for  systems  configured 
with  a  maximum  of  two.  three,  and  four  backends,  respectively. 
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TABLE  1  DATABASE  ALLOCATION  WITH  RESPECT  TO 
RESPONSE  TIME  REDUCTION 


Allocated  per  Backend 


KEY.  M  =  number  of  backends 

N  =  total  number  of  bytes  in  database 


TABLE  2  DATABASE  ALLOCATION  WITH  RESPECT  TO 
RESPONSE-TIME  INVARIANCE 


Configuration  Maximum  Number  Total  Database 
Number  of  Backends  Size  in  Mbytes. 


1  |  1  N 


2 


KEY.  M  =  number  of  backends. 

N  =  total  number  of  bvtes  in  database. 


2.  Database-size  Considerations 

To  adequately  measure  the  performance  characteristics  of  a  software 
multiple-backend  system,  three  different  database  sizes  are  preferred  [Ref.2: 
p.33].  all  of  which  are  a  multiple  of  the  base  (original)  size  N.  One  size  represents 
a  small  database  (N/4),  another  size  represents  an  intermediate  size  database 
(N/2)-  and  the  final  size  is  the  base  database  size  N.  The  base  (original)  database 
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size  X  is  determined  by  the  database-size  multiple.  Idbmj.  The  dbm  is  calculated 
as  follows: 

dbm  =  lcm{  1.2.3,. ...M}  x  32  x  record-size 
The  1cm  (least  common  multiple)  is  determined  using  the  maximum  number  of 


TABLE  S.  TEST  CONFIGURATIONS  WITH  TWO  BACKENDS. 


Configuration 

Number: 

Number  of 
Backends: 

Mbytes  per 
Backend: 

Total  database 

Size  in  Mbytes 

1 

1 

N 

N 

2 

2 

N/2 

N 

3 

2 

N 

2N 

Note:  | 

Configuration’s  {1.2}  are  required  to  verify  the  response-time  reduction  claim 
Configuration’s  {1.3}  are  required  to  verify  the  response-time  invariance  claim 


TABLE  4.  TEST  CONFIGURATIONS  WITH  THREE  BACKENDS. 


Configuration 

Number: 

Number  of 
Backends: 

Mbytes  per 
Backend: 

Total  database 

Size  in  Mbytes 

1 

1 

N 

N 

2 

2 

N  2 

N 

3 

3 

N  3 

N 

4 

2 

N 

2N 

5 

9 

U 

N 

3N 

1  Note:  j 

Configuration's  { 1.2.3}  are  required  to  verify  the  response-time  reduction  claim 
Configuration’s  {1.4,5}  are  required  to  verify  the  response-time  invariance  claim 


TABLE  5.  TEST  CONFIGURATIONS  WITH  FOUR  BACKENDS 


Configuration  Number  of  Mbytes  per  i  Total  database 

Number:  |  Backends:  Backend:  j  Size  in  Mbytes 


1  1  N  N 

2  2  N72  N 


Note: 

Configuration’s  {1.2, 3, 4}  are  required  to  verify  the  response-time  reduction  claim 
Configuration’s  { 1 ,5,6, 7 }  are  required  to  verify  the  response-time  invariance  claim 


backends.  M.  in  the  system.  The  record-size  value  is  hardware  specific  because  it 
depends  on  the  size  of  the  unit  of  data  management  of  the  particular  system's 
architecture  [Ref.  5:pp.  16-17].  The  size  of  the  unit  of  data  management  is  the  size 
of  the  data  transfer  from  the  secondary  storage  to  the  primary-  storage.  This 
element  of  information  is  one  of  three  elements  of  information  solicited  from  the 
user.  Strawser  [Ref.  5:pp.  16-17]  offers  a  scheme  for  dividing  this  disk  track  size 
into  four  record  sizes.  The  largest  of  the  four  record  sizes  is  used  in  the  dbm 
calculation.  The  largest  record  size  must  also  be  divisible  by  each  of  the  three 
smaller  record  sizes  to  simplify  the  database  sizing  process.  The  CAD  system 
implements  Strawser’s  scheme  by  having  the  four  record  sizes  as  fixed  multiples  of 
one  another. 

Vincent's  methodology  explains  the  requirement  for  the  database-multiple 
(dbm)  to  be  a  multiple  of  32  [Ref.  2:p.  44 j .  Basically,  the  number  32  is  necessary 
because: 


•  the  database  is  to  be  equaliy  divided  among  the  four  record  sizes,  i.e.. 
divisible  by  4. 

•  the  decision  to  have  the  smallest  database  (N/4)  to  be  one  fourth  the  size  the 
base  database  size  N.i.e..  divisible  by  another  4.  and 

•  a  parameter  relating  to  the  MBDS  storage  mechanism, (i.e..  each  MBDS 
cluster  must  hold  an  even  number  of  records)  i.e..  divisible  by  2. 

Thus.  4  x  4  x  2  =  32. 

The  final  criteria  for  how  large  the  database  size  may  be  is  of  course 
dependent  on  the  available  disk  storage  of  the  type  of  backend  to  be  used  in  the 
system.  Obviously,  for  testing  the  database  size  cannot  be  larger  than  the 
available  disk  storage  of  the  single  backend.  The  database-multiple  determines 
exactly  how  much  of  the  available  disk  storage  is  used  to  accommodate  the  largest 
database  size.  A  trivial  example  consists  of 
dbm  =  3  units, 

available  disk  storage  =  10  units,  and 
largest  database  size  =  9  units. 

Note  that  the  dbm  is  able  to  be  folded  into  the  available  disk  storage  3  times, 
thus  allowing  for  a  database  size  of  9.  If  the  dbm  was  4  units,  then  the  largest 
database  size  would  be  8.  A  more  concrete  example  follows.  A  system  is  defined 
to  have  the  following: 

available  disk  storage:  300  Megabytes 

disk  track  size:  4000  bytes 

number  of  backends:  3 


Strawser's  scheme  (based  on  the  disk  track  size!  for  selecting  record  sizes  is: 


large  record  size: 

2000 

bytes 

med-lg  record  size: 

1000 

bytes 

medium  record  size: 

400 

bytes 

small  record  size: 

200 

bytes 

The  calculations  are: 

lcm  =  6  {  1.2.  and  3  backends} 

dbm  =  lcm  x  32  x  large  record  size  =  6  x  32  x  2000  =  384.000  bytes 
X  in  Mbytes  =  299.904  where  384.000  can  be  folded  into  300  Mbytes 
781  complete  times 


Table  6  reflects  the  database  size  calculations  when  the  number  of  backends  is 
allowed  to  increase.  Note  the  dbm  for  11  backends.  The  dbm  itself  exceeds  the 
available  disk  storage  of  300  Mbytes!  Thus,  the  maximum  number  of  backends 
allowable  in  the  system  is  10.  The  database  design  factors  may  be  summarized  as 
follows: 


1)  three  database  sizes  that  best  measure  the  performance  claim  are  X.  X/2. 
and  X/4. 

2)  the  number  of  backends  in  the  system  is  used  to  calculate  the  least 
common  denominator,  lcm. 

3)  the  four  record  sizes  are  calculated  using  Strawser's  scheme.  These 
calculations  are  dependent  on  the  disk  track  size.  The  largest  record  size  is 
used  to  help  calculate  the  database-multiple. 

4)  the  database-multipie  is  calculated.  The  dbm  is  dependent  on  the  lcm  and 
the  largest  record  size. 


1^^ 

LCM  of 

{1 . Ml 

DBM  in 
Bytes 

■Jjllfl 

N:/4  in 
Mbytes 

2 

128.000 

299.904 

149.952 

74.976 

6 

384.000 

299.904 

149.952 

74.976 

12 

768.000 

299.520 

149.760 

5 

60 

3.840.000 

299.520 

149.760 

74.880 

6 

60 

3.840.000 

299.520 

149.760 

74.880 

7 

420 

26.880,000 

295.680 

147.840 

73.920 

8 

840 

53.760.000 

268.800 

134.400 

67.200 

9 

2.520 

161.280,000 

161.280 

80.640 

10 

2.520 

161.280,000 

161.280 

80.640 

11 

27.720 

1.774.080.000 

1.774.080 

443.520 

where: 

M  =  maximum  number  of  backends  in  the  database. 

LCM  =  Least  Common  Multiple. 

DBM  =  (LCM{  1.....M}  *  32  *  rec  size)  for  rec  size  =  2000-bytes. 
N  =  Size  in  Mbytes  of  large  test  database. 

N  2  =  Size  in  Mbytes  of  medium  size  test  database. 

S/4  =  Size  in  Mbytes  of  small  test  database. 

Assumption:  Largest  database  allowable  is  300  Mbytes. 


5)  the  values  of  N.  N/2,  and  N/4  are  calculated  based  on  the  dbm  and  the 
available  disk  storage. 

Vincent's  methodology  discusses  formatting  the  databases  using  one  of 
two  options:  (1)  the  use  of  only  one  record  type  per  database,  or  (2)  the  inclusion 
of  all  four  record  sizes  in  a  single  database.  Because  the  CAD  system  is  being 
designed  to  specifically  test  the  Naval  Postgraduate  School's  experimental 
software  multiple-backend  system,  option  (2)  is  implemented. 

3.  Test-Transaction  Mix  Considerations 

To  demonstrate  the  response-time  invariance  of  software  multiple- 
backend  systems,  the  CAD  system  must  ensure  that  any  increase  in  the  number  of 


backends  in  the  system  is  accompanied  by  a  proportional  increase  in  the  size  of 


a 


si 


the  database.  and  in  the  size  of  the  response  set  returned  by  the  test-transaction 
mix.  The  selection  of  the  test-transaction  mix  which  permits  the  database  size  to 
increase  in  the  same  proportion  as  the  increase  in  the  response  set  is  much  more 
complex.  The  selection  requires  a  complete  understanding  of  the  characteristics 
and  features  of  the  data  model  and  data  manipulation  language  [Ref.  2:p.  48]. 
Vincent  has  devised  a  methodology  that  creates  the  test-record  organization,  a 
test-database  structure,  and  a  test-transaction  mix  set  which  enables  the  system 
evaluator  to  use  the  same  organization,  structure,  and  mix  for  all  system 
configurations  without  modification! 

Vincent  also  cites  the  work  of  Hawthorn  and  Stonebreaker  [Ref.  6]  which 
suggests  the  use  of  three  sets  of  test  transactions  to  test  the  overall  performance 
of  MBDS.  One  set  consists  of  overhead- intensive  queries,  the  second  set 
consists  of  data-intensive  queries,  and  the  third  set  consists  of  multi-relation 
queries.  Vincent  ensures  that  all  of  these  factors  are  appropriately  considered 
when  selecting  transactions  to  verify  the  performance-gain  and  capacity-growth 
claims. 

4.  System-Dependent  Factors 

All  of  the  design  factors  mentioned  thus  far  have  been  independent  of  the 
system  with  the  exception  of  the  record-size  selection.  Vincent's  methodology  has 
shown  that  the  design  for  the  database  set  satisfies  the  requirements  for 
accommodating  all  required  test-system  configurations:  Vincent  has  also 
demonstrated  that  the  transaction-mix  generation  is  well  conceived  and 


appropriately  encompasses  the  requisite  considerations  for  verification  of  the 
performance-gain  and  capacity-growth  claims.  However,  there  are  two 
considerations  that  are  system  specific,  the  data  model  and  record  composition. 
The  data  model  used  by  the  database  system  is  directly  related  to  the  system's 
data  management  strategy,  to  include  such  factors  as  the  directory’  structure  and 
record  distribution  mechanism.  The  record  composition  may  rely  on  specific 
system  idioms  and  constraints,  such  as  limits  on  field  width,  or  on  the  number  of 
fields  within  a  record,  etc. 

B.  THE  MULTIPLE-BACKEND  DATABASE  SYSTEM  (MBDS) 

In  the  Laboratory  for  Database  Systems  Research,  a  prototyped  software 
multi-backend  database  system,  known  as  MBDS.  has  the  same  architectural 
organization  depicted  in  figure  3  for  the  software  multiple-backend  approach  to 
database  management.  One  minicomputer  or  microcomputer  functions  as  the 
controller,  while  one  or  more  microcomputers  and  their  disk  systems  serve  as 
backends.  Both  the  controller  and  the  backends  are  interconnected  by’  a 
broadcast  bus.  Together,  the  controller,  the  broadcast  bus.  and  the  backends 
comprise  a  database  system  which  is  specifically  designed  to  overcome  the  the 
performance  and  capacity-growth  problems  experienced  by  traditional 
mainframe- based  and  conventional  software  single-backend  database  systems. 
The  data  in  the  MBDS  system  is  distributed  across  the  disk  systems  of  each 
backend  computer.  Consequently,  a  user  transaction  may  be  executed 


simultaneously  by  all  backends.  The  initial  design  and  and  analysis  of  MBDS  is 
presented  in  Reference  7  and  Reference  8. 

The  MBDS  hardware  features  a  minicomputer  (VAX-11-750)  and  8  micro¬ 
computers  (Integrated  Solutions  Incorporated  workstations).  All  computer 
systems  utilize  the  4.2  BSD  Unix  Operating  System.  The  VAX  11-750  serves  as 
the  controller,  while  the  eight  ISI  workstations  function  as  backends.  The  ISI 
workstation  is  based  on  the  Motorola  68020  CPU,  which  features  16-Mbytes  of 
virtual  memory  space  per  process.  Each  backend  has  one  dedicated  Control  Data 
Corporation  Winchester-type  disk  drive  with  a  maximum  formatted  capacity  of 
500-Mbytes  per  drive.  The  MBDS  configuration  uses  Ethernet  as  the  broadcast 
bus  among  the  controlled  backends. 

Recall  there  are  two  considerations  that  are  system  specific,  the  data  model 
and  record  composition.  The  remainder  of  this  section  is  devoted  to  the  MBDS 
attribute-based  data  model  and  the  attribute-based  data  language. 

1.  The  Attribute-Based  Data  Model 

MBDS  is  based  on  the  attribute-based  data  model  first  proposed  in 
Reference  9.  In  the  attribute-based  data  model,  data  is  considered  in  the  following 
constructs:  database,  file,  record,  attribute-value  pair,  keyword,  attribute-value 
range,  directory  keyword,  non-directory  keyword,  directory,  record  body,  keyword 
predicate,  and  query.  Informally,  a  database  consists  of  a  collection  of  files.  Each 
file  contains  a  group  of  records  which  are  characterized  by  a  unique  set  of 
keywords.  A  record  is  composed  of  two  parts.  The  first  part  is  a  collection  of 


at  tribute- value  pairs  or  keywords.  An  attribute-value  pair  is  a  member  of 
the  Cartesian  product  of  the  attribute  name  and  the  value  domain  of  the 
attribute.  As  an  example.  <POPULATION,  25000>  is  an  attribute-value  pair 
having  25000  as  the  value  for  the  population  attribute.  A  record  contains  at  most 
one  attrioute-value  pair  for  each  attribute  defined  in  the  database.  Certain 
attribute-value  pairs  of  a  record  (or  a  file)  are  called  the  directory  keywords  of 
the  record  (file),  because  either  the  attribute-value  pairs  or  their  attribute-value 
ranges  are  kept  in  a  directory  for  identifying  the  records  (files).  Those 
attribute-value  pairs  which  are  not  kept  in  the  directory  are  called  non- 
directory  keywords.  The  rest  of  the  record  is  textual  information,  which  is 
referred  to  as  the  record  body.  An  example  of  a  record  is  shown  below. 

(  <FILE.  USCensus>.  <CITY,  Monterey>,  <POPULATION.  25000>, 

{  Temperate  climate  }  ) 

The  angle  brackets.  <.>.  enclose  an  attribute-value  pair,  i.e..  keyword.  The  curly 
brackets,  {,},  include  the  record  body.  The  first  attribute-value  pair  of  all  records 
of  a  file,  by  convention,  is  the  same.  In  particular,  the  attribute  is  FILE  and  the 
value  is  the  file  name.  A  record  is  enclosed  in  the  parenthesis.  For  example,  the 
above  sample  record  is  from  the  USCensus  file. 

The  records  of  the  database  may  be  identified  by  keyword  predicates.  A 
keyword  predicate  is  a  3-tuple  consisting  of  a  directory  attribute,  a  relational 
operator  (  =  .  !=,  >,  <,  $).  and  an  attribute  value,  e.g.,  POPULATION  ^  20000 

is  a  keyword  predicate.  More  specifically,  it  is  a  greater-than-or-equal-to 


predicate.  Combining  keyword  predicates  in  disjunctive  normal  form 
characterizes  a  query  of  the  database.  The  query 


(  FILE  =  USCensus  and  CITY  =  Monterey  )  or 
(  FILE  =  USCensus  and  CITY  =  San  Jose  ) 

will  be  satisfied  by  all  records  of  the  USCensus  file  with  the  CITY  of  either 
Monterey  or  San  Jose.  For  clarity,  we  also  employ  parentheses  for  bracketing 
conjunctions  in  a  query . 

2.  The  Attribute- Based  Data  Language  (ABDL 


The  attribute-based  data  language  supports  the  five  primary  database 
operations,  INSERT.  DELETE,  UPDATE.  RETRIEVE,  and  RETRIEVE- 
COMMON.  A  request  in  the  ABDL  is  a  primary  operation  with  a  qualification. 
A  qualification  is  used  to  specify  the  part  of  the  database  that  is  to  be  operated 
on.  Two  or  more  requests  may  be  grouped  together  to  form  a  transaction. 
Now,  let  us  illustrate  the  five  types  of  requests  and  forgo  their  formal 
specifications. 

The  INSERT  request  is  used  to  insert  a  new  record  into  the  database. 
The  qualification  of  an  INSERT  request  is  a  list  of  keywords  with  or  without  a 
record  body  being  inserted.  In  the  following  example,  an  INSERT  request  that 
INSERT  UFILE.  USCensus  -  CITY.  Cumberland  >.  < POPULATION.  40000>) 
will  insert  a  record  without  a  record  body  into  the  USCensus  file  for  the  city 
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Cumberland  with  a  population  of  40.000. 


A  DELETE  request  is  used  to  remove  one  or  more  records  from  the 


database.  The  qualification  of  a  DELETE  request  is  a  query.  The  following 
example, 

DELETE  ((FILE  =  USCensus)  and  (POPULATION  >  100000)) 

is  a  request  that  will  delete  all  records  whose  population  is  greater  than  100.000  in 
the  USCensus  file. 

An  UPDATE  request  'is  used  to  modify  records  of  the  database.  The 
qualification  of  an  UPDATE  request  consists  of  two  parts,  the  query  and  the 
modifier.  The  query  specifies  which  records  of  the  database  are  to  be  modified. 
The  modifier  specifies  how  the  records  being  modified  are  to  be  updated.  The 
following  example, 

UPDATE  (FILE  =  USCensus)  (POPULATION  =  POPULATION  -  5000) 

is  an  UPDATE  request  that  will  modify  all  records  of  the  USCensus  file  by 
increasing  all  populations  by  5.000.  In  this  example.  (FILE  =  USCensus)  is  the 
query  and  (POPULATION  =  POPULATION  4-  5000)  is  the  modifier. 

The  RETRIEVE  request  is  used  to  retrieve  records  of  the  database.  The 
qualification  of  a  retrieve  request  consists  of  a  query,  a  target-list,  and  a  by-clause. 
The  query  specifies  which  records  are  to  be  retrieved.  The  target-list  consists  of  a 
list  of  output  attributes.  It  may  also  consist  of  an  aggregate  operation,  i.  e.. 
AVG.  COUNT.  SUM.  MIN.  MAX.  on  one  or  more  output  attribute  values.  The 
optional  by-clause  may  be  used  to  group  records  when  an  aggregate  operation  is 


optional  bv-clause  may  be  used  to  group  records  when  an  aggregate  operation  is 
specified.  The  RETRIEVE  request. 

RETRIEVE  ((FILE  =  USCensus)  and  (POPULATION  ^  50000))  (CITY,  POPULATION) 

will  retrieve  the  city  names  and  populations  of  all  records  in  the  USCensus  file 
whose  populations  are  greater  than  or  equal  to  50,000.  ((FILE  =  USCensus)  and 
(POPULATION  ^  50.000))  is  the  query  and  (POPULATION,  CITY)  is  the 
target-list.  There  is  no  use  of  the  by-ciause  or  aggregation  in  this  example. 

Lastly,  the  RETRIEVE-COMMON  request  is  used  to  merge  two  files  by 
common  attribute-values.  Logically,  the  RETRIEVE-COMMON  request  can  be 
considered  as  a  transaction  of  two  retrieve  requests  that  are  processed  serially  in 
the  following  general  form. 

RETRIEVE  (query-1)  (target-list- 1) 

COMMON  (attribute-1,  attribute-2) 

RETRIEVE  (query-2)  (target-list-2) 

The  common  attributes  are  attribute- 1  (associated  with  the  first  retrieve  request) 

and  attribute-2  (associated  with  the  second  retrieve  request).  In  the  following 

example,  the  RETRIEVE-COMMON  request 

RETRIEVE  ((FILE  =  CanadaCensus)  and  (POPULATION  >■  100000))  (CITY) 

COMMON  (POPULATION.  POPULATION) 

RETRIEVE  ((FILE  =  USCensus)  and  (POPULATION  >  100000))  (CITY) 

will  find  all  records  in  the  CanadaCensus  file  with  population  greater  than 
100.000.  find  all  records  in  the  USCensus  file  with  population  greater  than 
100,000,  identify  records  of  respective  files  whose  population  figures  are  common. 


and  return  the  two  city  names  whose  cities  have  the  same  population  figure^ 
ABDL  provides  five  seemingly  simple  database  operations,  which  are  nevertheless 
capable  of  supporting  complex  and  comprehensive  transactions. 

C.  THE  CAD  FRAMEWORK 

The  CAD  system  is  being  implemented  in  a  number  of  versions.  The  first 
version,  described  in  this  thesis,  generates  the  test-databases,  the  test  transaction 
mixes,  and  an  evaluator's  guide,  but  is  not  integrated  with  MBDS.  The  databases, 
the  mixes,  and  the  guide  are  in  fact  the  major  components  of  the  CAD  system. 
The  framework  of  this  first  version  is  to  create  a  "TEST"  directory  of  files  for  the 
system  evaluator  to  manually  input  to  MBDS.  The  files  that  are  created  by  the 
CAD  system  and  passed  to  the  TEST  directory  represent  both  the  generated  test 
databases  and  the  generated  transaction  mixes.  The  system  evaluator  is  assisted 
in  his  testing  by  the  guide  provided  by  the  CAD  system. 

The  framework  of  the  TEST  director,-  rnr  *  MBDS  configured  vith  up  to  3 
backends  is  represented  in  Figure  4.  Recalling  that  a  3-backend  system  requires  a 
total  of  5  configurations  per  database,  i.e.. 

for  M  =  3.  the  number  of  configurations  =  2M  -  1  =  2(3)  -1=5. 

Each  database  (small  (N/4).  medium  (N/2).  and  large  (N))  has  its  own  set  of 
configuration  files  as  well  as  it  own  transaction-mix  files.  The  configuration  files 
are  made  up  of  record  files,  and  are  in  fact  the  generated  database.  The 
transaction-mix  file  is  used  to  test  the  performance  and  growth-capacity  claims  for 
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1.  Characteristics  and  Notions  of  the  CAD  System 

The  CAD  system  is  designed  to  receive  a  minimal  amount  of  input  from 
the  user.  Upon  receiving  3  essential  elements  of  information  from  the  user,  the 
CAD  system  is  to  perform  the  following  actions: 


•  to  make  a  number  of  calculations  that  affect  the  generation  of  the 
database  sets  and  the  transaction  mixes. 

•  to  create  the  template  file,  the  3  descriptor  files  (one  for  each  database 
size  N,  N/2,  N/4),  the  three  response-time  reduction  record  files  (again, 
one  for  each  database  size),  and  the  requisite  number  of  response-time 
invariance  record  files. 

•  and  to  produce  an  evaluator’s  guide  as  a  very  important  by-product. 
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The  evaluator's  guide  is  a  report  that  characterizes  the  generation  of  the  test 
databases  and  transaction  mixes.  Also,  the  guide  is  the  user’s  tool  to  place  the 
generated  CAD  files  into  MBDS  for  testing.  The  most  powerful  characteristic  of 
the  CAD  system  is  that  just  about  every  separate  entity  within  the  CAD  system 
is  in  some  form  or  multiple  of  another  entity.  Within  the  CAD  system  there  are 
two  important  concepts  that  serve  as  the  nucleus  for  aiding  in  the  creation  of  all 
of  the  files  that  are  placed  in  the  TEST  directory. 

The  first  concept  involves  the  creation  of  a  number  of  factors  that  serve 
as  multipliers  for  entities.  For  example,  three  database  sets  are  generated  by  the 
CAD  system  with  two  sets  being  multiples  of  the  original,  e.g..  X.  N/2.  and  N/4. 
The  first  concept  is  to  create  a  database  factor  to  represent  the  relationships  (in 
terms  of  multiples)  among  the  database  sizes.  In  particular,  a  data  construct 
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named  dbase  factor  represents  the  ratios  of  the  database  sizes.  In  our 
implementation,  dbase  factor  is  an  array  of  three  integers.  1.2  and  4.  The  utility 
of  this  notion  is  that  the  CAD  system  merely  creates  one  database  and  does  so 
with  the  dbase  factor  set  to  "1".  The  other  two  databases  are  generated  by 
simply  accessing  the  appropriate  dbase  factor  from  the  array.  We  also  apply  the 
factor  concept  to  represent  relationships  among  database  record  sizes.  Recall  that 
Stawser's  scheme  for  the  record-size  selection  has  3  of  the  record  sizes  as  multiples 
of  the  4th  record  size.  A  record  factor  is  created  to  represent  the  relationships 
between  the  record  sizes.  Again,  in  particular,  a  data  construct  named 
record  factor  represents  the  ratios  of  the  record  sizes.  In  our  implementation, 
recordfactor  is  an  array  of  four  integers,  1.  2.  5  and  10. 

The  second  concept  is  a  table  of  numbers  that  reflects  the  distribution  of 
records  across  the  backends  of  MBDS  .  Recall  that  in  MBDS.  a  cluster-based 
database  placement  algorithm  is  used  to  distribute  clusters  of  records  across  the 
backends,  and  that  this  algorithm  is  tightly  coupled  to  the  attribute-base  data 
model.  Therefore,  we  develop  a  method  by  which  we  can  successfully  model  the 
algorithm  to  insure  an  even  distribution  of  database  records  of  a  cluster  across  the 
backends.  Hence,  we  instantiate  a  data  structure  that  can  be  used  to  model  the 
concept.  The  name  given  to  this  data  structure  is  the 
base  record  &  block  distribution  tabie.  The  numbers  placed  in  this 
"base"  table  result  from  a  few  simple  calculations  dependent  on  the  user's  input. 
The  numbers  reflect  the  upper  bounds  on  the  number  of  records,  blocks,  and 
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clusters  formed  and  distributed  across  the  backends  of  MBDS.  The  values  in  the 
table  are  used  to  generate  the  appropriate  number  of  records  needed  for  a  pre¬ 
calculated  database  size.  They  are  also  used  to  determine  the  range  values  of  the 
descriptor  files,  and  they  serve  to  determine  values  in  the  transaction  mixes  as 
well. 

The  next  chapter  presents  these  two  concepts  as  the  building  blocks  for 
two  of  the  major  components  of  the  CAD  system,  the  generated  test  database  sets 
and  the  generated  test-transaction  mixes. 
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A.  HIGH-LEVEL  ORGANIZATION  OF  THE  CAD  SYSTEM 


For  discussion  the  high-level  organization  of  the  CAD  system  is  presented  as 
three  areas:  (l)  the  characteristics  of  CAD,  (2)  the  files  created  by  CAD,  and  (3) 
the  high-level  program  structure  of  CAD.  The  characteristics  of  the  CAD  describe 
the  mechanisms  by  which  the  values  found  within  the  files  are  determined.  The 
created  files  are  in  fact  the  generated  test  databases  and  the  transaction  mixes, 
and  comprise  a  part  of  the  evaluator's  report.  The  high-level  program  structure 
provides  an  overview  of  the  CAD  system  itself. 

1.  Characteristics  of  the  CAD  System 

Chapter  II  introduced  the  two  concepts  that  serve  as  the  nucleus  for 
creating  all  of  the  files  placed  in  the  TEST  directory.  The  first  concept  involved 
the  creation  of  a  number  of  factors  that  serve  as  multipliers  for  entities  found 
within  the  CAD  system.  The  second  concept  involved  the  creation  of  a  base  table 
that  reflects  the  distribution  of  records  across  the  backends  of  MBDS.  This  section 
presents  these  two  concepts  as  the  building  blocks  for  the  generation  of  all  files 
created  by  the  CAD  system. 

a.  The  Creation  of  the  CAD  Factors 

The  purpose  of  the  CAD  system  is  to  generate  the  template  file,  the 
descriptor  files,  the  record  files,  and  the  transaction-mix  files.  A  detailed 
discussion  of  these  files  is  presented  later.  The  first  three  sets  of  files  are 
indicative  of  the  attribute-based  data  model.  The  transaction-mix  files  are 
generated  for  benchmarking  purposes.  The  values  placed  in  the  template  files. 


descriptor  files,  the  record  files,  and  the  Transaction-mix  files  are  characteristically 
dependent  upon 

•  the  size  of  the  database  (small,  medium,  or  large),  and 

•  the  record  class  (large,  medium-large,  medium,  or  small). 

Vincent’s  methodology  [Ref.  2]  defines  the  ratios  among  the  database  sizes,  and 
Strawser  [Ref.  5]  defines  the  ratios  among  the  record  classes.  To  represent  these 
two  sets  of  ratios,  two  data  constructs  are  introduced.  The  data  construct 
dbasefactor  represents  the  ratios  of  the  database  sizes.  In  our  implementation, 
dbasefactor  is  an  array  of  three  integers,  1,  2,  and  4.  The  data  construct 
recordfactor  represents  the  ratios  of  the  record  classes.  In  our  implementation, 
recordfactor  is  an  array  of  four  integers,  1,  2,  5.  and  10.  Both  factors  are  used  to 
generate  tables  of  data  found  in  the  user’s  guide,  to  perform  the  calculations 
necessary  to  create  the  base-record-and-block-distribution  table,  and  to  perform 
the  calculations  for  the  values  of  the  transaction  mixes. 

b.  The  Base-Record-and-Block-Distribution  Table 

The  base-record-and-block-distribution  table,  hereby  referred  to  as 
the  base  table,  represents  the  records,  blocks,  and  clusters  formed  and 
distributed  across  the  backends  of  MBDS.  Vincent’s  methodology  [Ref.  2:p.  68) 
defines  nine  cluster  categories,  with  each  cluster  category  representing  a  cluster 
containing  from  2  to  10  blocks  of  records  per  cluster.  Thus,  the  base  table  reflects 
the  distribution  of  records,  blocks,  and  clusters  across  9  categories.  Table  7  depicts 


the  layout  of  the  base  table. 
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TABLE  7.  BASE  RECORD  AND  BLOCK  DISTRIBUTION  TABLE. 

(BASE  TABLE) 


The  values  for  the  matrix  are  dependent  on  a  few  initial  calculations  which  in 
turn  are  dependent  on  the  user’s  input.  The  nature  of  these  calculations  is 
discussed  later.  However,  it  is  appropriate  to  note  which  columns  are  correlated  to 
dbase  factor  and  record  factor, 
column  1:  record  _factor 
column  2:  remains  constant 
column  3:  record  factor 
column  4:  dbasefactor 
column  5:  record  factor  and  dbase  factor 
column  6:  dbase  factor 

column  7:  dbase  factor,  number  of  backends,  and  configuration  number 

2.  CAD-Generated  Files 

This  section  introduces  the  files  created  by  the  CAD  system  for  each  of 
the  three  major  components.  The  Database  Component  has  three  types  of  files 
generated  to  represent  the  three  databases: 

•  the  Template  file-  one  template  file  common  to  all  three  databases; 

•  the  Descriptor  file-  three  descriptor  files,  one  per  database:  and 
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•  the  Record  file-  a  number  of  record  files  l  dependent  on  the  number  of 
backends  in  the  system)  representing  both  the  response-time- reduct  ion  and 
the  response-time-iiivariance  configurations. 

The  Transaction-Mix  Component  generates  one  type  of  file  : 

•  the  transaction-mix  file. 

All  together,  the  CAD  system  generates  twelve  transaction  mix-files,  four  files  per 
database.  Each  of  the  four  files  per  database  represent  24  transactions  for  a  given 
record  class. 

The  Evaluator's  Report  Component  is  comprised  of  two  types  of  files: 

•  the  standard-text  files 

•  the  empirical-data  files. 

The  standard-text  files  present  a  narrative  for  the  evaluator,  providing 
instructions  on  how  to  interface  the  CAD  generated  test-database  and  test- 
transaction-mix  files  with  MDBS.  The  text  files  also  present  a  discussion  for 
interpeting  and  analyzing  the  empirical  data  calculated  by  the  CAD  system.  The 
empirical  data  files  are  a  collection  of  tables  that  reflect  information  about  the 
generated  test-databases  and  generated  test-transaction  mixes  based  upon  the 
user's  input.  A  more  detailed  explanation  of  each  of  the  files  is  presented  in  the 
discussion  of  its  associated  component. 

3.  The  High-Level  Program  Structure  of  the  CAD  System 

The  CAD  system  has  been  designed  utilizing  a  top-down  strategy.  The 
system  is  entitled  as  the  CAD  Benchmark  System.  At  the  highest  level.  5 


subordinate  tasks  are  performed: 

•  Initialization. 

•  Receiving  User  Input. 

•  Initial  Calculations, 

•  Creating  Files,  and 

•  Cleanup. 

The  first  3  tasks  initialize  the  state  of  the  CAD  system.  Within  the  4th  task,  the 
major  components  of  the  CAD  system  are  created.  The  5th  task  performs  a 
cleanup  of  all  temporary  files  and  allocated  memory  created  by  the  CAD  system 
once  the  CAD  system  has  finished. 

The  high-level  program  structure  can  be  described  in  three  phases.  The 
first  phase  is  the  preparation  phase  which  encompasses  the  initialization  of  data 
structures,  the  solicitation  of  input  data  from  the  user,  and  the  initial 
calculations.  The  second  phase  is  the  files  creation  phase  which  encompasses  the 
generation  of  all  the  requisite  test-database  files,  the  test-transaction-mix  files, 
and  the  report  files.  The  third  phase  is  the  cleanup  phase  which  encompasses 
system  commands  to  purge  all  files  no  longer  needed  once  the  CAD  system  has 
completed  its  objective. 

a.  The  Preparation  Phase 

The  Preparation  Phase  begins  the  processing  of  the  CAD  system. 
Three  tasks  must  be  preformed  prior  to  creating  any  of  the  three  major 
components.  Each  task  is  discussed  separately. 


(1)  Initialization.  The  initialization  encompasses  definition 
assignments,  array  initialization,  and  the  declaration  of  data  types.  Variables  are 
both  global  and  local.  Only  the  more  important  features  of  the  initialization  are 
addressed.  The  discussion  covers  the  definitions,  the  arrays,  key  variables,  and  a 
structure  representing  the  base  table. 

The  more  important  definitions  are  those  that  reflect  certain 
known  factors  about  the  methodology  being  implemented.  These  definitions  are 
global  and  are  shown  in  table  8.  The  maximum  number  of  backends  h?s  been 
chosen  to  be  10.  This  is  not  a  major  constraint  on  the  CAD  system.  Fixing  the 
maximum  number  of  backends  has  provided  for  fewer  algorithms  to  be  designed, 
coded  and  tested. 

The  more  important  arrays  are  those  that  hold  information  that 
is  common  to  each  of  the  major  components.  Two  sets  of  arrays  hold  such 
information.  One  set  of  arrays  holds  the  key  attribute  names  and  the  other  set  of 
arrays  holds  the  initial  calculations.  These  arrays  are  also  global  and  are  shown  in 
table  9. 

One  other  array  initialization  important  to  the  initial  calculations 
is  the  ’.east-common-multiple  array  known  as  the  1cm  table.  The  1cm  table 
stores  the  least-common-multiple  values  for  the  integers  1  to  10.  Recall  that  the 
system  calculates  the  least  common  multiple  for  the  number  of  backends  in  the 
system  and  the  maximum  number  of  backends  presently  allowed  is  10. 


TABLE  8.  IMPORTANT  GLOBAL  DEFINITIONS 


ASSIGNMENT 


DESCRIPTION 

NAME 

Number  of 

Cluster  categories 

num  clus  cat 

Number  of 
database  sizes 

num  db  sizes 

Number  of 
record  sizes 

num  red  sizes 

_ 
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(3)  Perform  Initial  Calculations.  Eight  calculations  are  pertormed 
within  the  Initial  Calculations  task.  Five  of  the  calculations  are  functions  that 
return  values  assigned  to  global  variables.  The  remaining  three  calculations  are 
procedures  that  determine  values  for  the  three  global  arrays.  Each  of  the 
calculations  is  followed  by  an  explanation. 

The  first  calculation  concerns  the  number  of  configurations.  The 
number  of  configurations  is  equal  to  twice  the  number  of  backends  less  one. 


THE  NUMBER  OF  CONFIGURATIONS 


Function 


Variable 

Function  name 

Parameters  Passed 

num  config 

Calc  config 

num  be 

num  config  =  2  x  num  be  -  1 

j.ne  second  calculation  concerns  tne  least  common  munipie.  oy 
accessing  the  lcmtable  array  in  Table  9.  the  value  of  num  be  being  returned  is 
the  least  common  multiple,  e.g..  if  num  be  =  3.  the  value  returned  from  the  array 
is  6. 
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THE  LEAST  COMMON  MULTIPLE 


Function 


Variable 

Function  name 

Parameters  Passed 

1cm 

Calc  1cm 

num  be 

The  third  calculation  concerns  the  four  record  sizes.  The  first  of 
the  four  records  sizes  is  1/2  the  disk  track  size.  The  remaining  three  sizes  are 
multiples  of  the  first.  Each  remaining  record  size  is  calculated  by  dividing  the  first 
record  size  by  the  respective  record  factor  (the  rcd  factor  array  in  Table  9). 


THE  FOUR  RECORD  SIZES 


Procedure 

array 

Procedure  name 

Parameters  Passed 

BB 

Four  record  sizes() 

dsk  trk  sz 

The  fourth  calculation  concerns  the  available  disk  storage.  The 
available  disk  storage  is  80%  of  the  maximum  disk  storage.  20%  of  the  disk 


storage  is  reserved  for  the  MBDS  directory. 
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THE  AVAILABLE  DISK  STORAGE 


F  unction 


Variable 

Function  name 

avail  dsk  storage 

Calc  avail  dsk  storage 

The  fifth  calculation  concerns  the  database  multiple.  The 
database  multiple  is  calculated  by  multiplying  the  least  common  mutiple  by  the 
number  32  by  the  largest  record  size  (from  four_rcd_sz  array  of  Table  9). 
Chapter  two  presented  the  calculation  for  the  database  multiple  in  detail. 


THE  DATABASE  MULTIPLE 


Function 


Variable  Function  name  Parameters  Passed 


dbm  Calc  database  multiple  1cm 


dbm  =  1cm  x  32  x  four  rcd_szjlg  jrcd  sz;. 
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Procedure 


Arrav  Procedure  name  Parameters  Passed 


Thus,  the  largest  database  size  equals  the  database  multiple  multiplied  by  the 


number  of  folds: 


db  size[largest]  =  dbm  x  folds. 


The  remaining  two  databases  (medium  and  small)  are  calculated  by  dividing  the 
largest  database  size  by  the  respective  database  factor. 

The  seventh  calculation  concerns  the  number  of  attributes  per 
record  size.  The  number  of  attributes  per  record  size  is  calculated  by  dividing  the 
record  size  by  the  numeric  common  divisor  10.  The  reason  for  this  calculation  is 
explained  in  the  Generated  Test-Database  Component  section. 


THE  NUMBER  OF  ATTRIBUTES  PER  RECORD  SIZE 


Procedure 


Procedure  name 


num  attr  per  red  szi  Calc  num  aur  for  ea  red  sz 


Parameters  Passed 


VMVCVC 


The  eighth  calculation  concerns  the  key  number  of  records.  Tin 
key  num  reds  is  a  variable  that  identifies  the  number  of  records  in  the  largest 
record  class  of  the  smallest  database.  This  number  of  records  is  considered  to  be 
the  key  because  every  other  record  class  size  in  each  of  the  databases  is  a  multiple 
of  this  number.  The  key  num  reds  variable  is  the  determining  factor  for  al] 
calculations  made  with  respect  to  the  base  record  and  block  distribution  table. 
Once  the  initial  calculations  are  made  the  Create  Files  Phase  begins. 


THE  KEY  NUMBER  OF  RECORDS 


Function 

Variable 

Function  name 

Parameters  Passed 

key  num  reds 

Calc  key  num  records 

key  num  reds  = 

(dbase  szsmall  num  red  classes)  four  red  szdargei. 

b.  The  Create-Files  Phase 

The  five  modules  of  the  Create  Files  Phase  comprise  the  creation  of 
the  three  major  components  of  the  CAD  system:  the  generation  of  the  test- 
database  files,  the  generation  of  the  test-transaction-mix  files,  and  the  generation 
of  the  report  files.  Most  of  the  report  files  are  generated  simultaneously  with  the 


other  two  files.  There  is  however  one  module  dedicated  solely  to  the  generation  of 


one  specific  report.  This  section  present-  riie  high-level  modules  invoked  for  each 
of  the  major  files. 

(1)  Generating  the  Test-Database  Files.  Upon  completion  of  the 
initial  calculations,  the  CAD  system  begins  the  generation  of  the  test-database 
files.  Three  high-level  modules  are  invoked: 

•  Maketemplatefile, 

•  Make_descriptor_files, 

•  Make_record  files. 

Each  of  these  modules  in  turn  calls  subordinate  modules  which  generate  the  the 
appropriate  requisite  files.  The  details  of  these  modules  are  presented  in  the  Test- 
Database-Generation  Component. 

(2)  Generating  the  Transaction-Mix  Files.  Upon  completion  of  the 
test-database  files,  the  CAD  system  begins  the  generation  of  the  test-transaction- 
mix  files.  Just  one  high-level  module  is  invoked  for  this  task: 

•  Generate  trans  mixes. 

The  details  of  this  module  are  presented  in  the  Test-Transaction  Mix  Generation 
Component. 

(3)  Generating  the  Report  Files.  The  CAD  system  invokes  just  one 
module  to  generate  a  specific  report  which  is  placed  into  3  files.  The  module  being 
invoked  is: 


•  Format  test  benchmark  databases. 
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The  specific  report,  which  has  its  own  dedicated  module,  provides  the  user  with 
three  tables  of  information.  Each  table  is  a  file  and  is  indicative  of  a  database. 
The  information  in  the  table  depicts  the  test  configurations  for  a  specific  database 
size.  Table  10  depicts  the  test  configurations  for  a  small  database.  The  system 
parameters  resulting  in  a  specific  configuration,  for  example,  are: 


available  disk  storage 
disk  track  size 
number  of  backends 


300  Megabytes 
4000  bytes 
3 


All  of  the  other  report  files  are  comprised  of  information  depicting  data  used  in 
calculations,  in  relationships,  and  for  attribute  values.  These  report  files  are 
generated  concurrently  as  the  data  is  being  generated  within  either  of  the  other 
two  major  components.  These  report  files  include: 

•  the  records-per-block-relationship  table. 

•  the  number-of-records-per-cluster-category  table. 

•  the  record- A:-block-distribution  table  for  each  database. 

•  4  transact  ion- mix  files  per  database,  and 

•  4  transaction-mix  workload  files  per  database. 
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The  standard  text  files  are  not  generated  by  the  CAD  system.  These  files  are 
actually  part  of  the  CAD  system  and  are  interleaved  with  the  generated  report 
files  when  the  evaluator's  report  manual  is  assembled, 
c.  The  Cleanup  Phase 

The  cleanup  phase  is  comprised  of  a  number  of  system  commands  to 
purge  all  of  the  generated  CAD  files  once  the  user  has  no  further  need  for  them. 
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TABLE  10.  SMALL  DB  TEST  CONFIGURATIONS 


Configuration 

Number 

Number 

of 

Backends 

Record 
Site  in 

Byiet 

Number  o' 
Records  per 
B&ckend 

My  bytes 
per 

Backend 

Database 
Size  in 
Mbytes 

i 

i 

2000 

9372 

18744000 

1000 

18744 

18744000 

400 

46860 

200 

93720 

18744000 

2 

2 

2000 

4686 

9372000 

1000 

9372 

9372000 

400 

23430 

9372000 

200 

46860 

9372000 

O 

1/ 

9 

2000 

■  - 

6248000 

1000 

HiHHy  >.  * 

6248000 

400 

6248000 

200 

6248000 

74.976 

4 

O 

2000 

9372 

18744000 

1000 

18744 

18744000 

400 

46860 

18744000 

200 

93720 

18744000 

149.952 

5 

O 

2000 

9372 

18744000 

1000 

18744 

18744000 

400 

46860 

18744000 

200 

93720 

18744000 

224.928 

B.  THE  TEST-DATABASE-GENERATION  COMPONENT 

The  objective  of  the  database  component  is  to  pass  to  a  TEST  directory  all  of 
the  necessary  files  that  represent  the  three  database  sizes,  small,  medium  and 
large.  Recall  that  the  CAD  constructs  a  tree  directory  for  indexing  the  report  and 
for  each  of  the  respective  databases  and  its  associated  database  and  transaction- 
mix  files.  With  respect  to  the  generated  test-database,  the  directory  contains  the 
three  types  of  database  files  described  by  the  attribute-based  data  model  and 
utilized  by  MBDS.  The  three  types  of  files  are  the  template  file,  the  descriptor 


file,  and  the  record  file.  These  files,  generated  by  the  CAD  system,  are  manually 


The  database  component  is  described  in  two  parts.  The  first  part  describes 
the  generated  test-database  files  and  the  second  part  addresses  the  database 
program  modules. 

1.  The  Generated  Database  Files 

The  CAD  system  generates  three  database  sets,  a  large  database,  a 
medium  database,  and  a  small  database.  Each  database  may  be  described  in 
terms  of  the  generated  files  that  represent  it.  i.e..  the  template  file,  the  descriptor 
file,  and  the  record  files. 

a.  The  Template  File 

Each  of  the  database  sets  (small,  medium,  and  large)  shares  a 
common  single  file  known  as  the  template  file.  This  file  contains  four  independent 
templates,  with  each  template  associated  with  a  different  record  class.  Each 
template  contains  the  names  of  both  the  directory  and  non-directory  attributes, 
and  reflects  the  structure  and  the  type  of  each  of  the  record  classes.  By  class  we 
recall  that  Strawser  [Ref.  5]  creates  four  record  classes,  large,  medium-large, 
medium  and  small.  By  type  we  mean  the  type  of  the  attribute  values,  i.e..  either 
string  (s).  integer  (i).  or  floating  number  (f). 

b.  The  Descriptor  File 

Each  of  the  three  database  sets  has  its  own  unique  descriptor  file. 
The  descriptor  file  contains  indexing  information  for  each  directory  attribute  in 
the  database.  This  indexing  information  takes  two  distinct  forms.  First,  a 
directory  attribute  may  be  described  by  indexes  which  represent  a  list  of  the 
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possible  values  that  the  directory  attribute  may  be  assigned  la  type-B  attribute!. 
Second,  the  director.’  attribute  may  be  described  by  indexes  which  represent 


attribute-value  ranges  (a  type- A  attribute).  Each  descriptor  file  identifies  the 
directory  attributes:  the  template  attribute,  and  the  descriptor  attributes.  The 


template  attribute  receives  as  a  value  one  of  four  template  names  describing  the 
record  size  class:  template  large  (Templg),  template  medium-large 
(Tempmedlg),  template  medium  (Tempmed),  or  template  small 
(Tempsmall). 

The  names  given  to  the  descriptor  attributes  identify  the  descriptor 
attribute  itself,  and  the  record  class  associated  with  it.  Thus  the  names  for  the 
descriptor  attribute  is  composed  of  three  parts.  The  first  part  is  entitled  INT 
meaning  integer.  The  second  part  is  entitled  either  ONE  or  TWO.  ONE 
associates  the  descriptor  attribute  to  the  nine  cluster  categories.  TWO  associates 
the  descriptor  attribute  to  the  number  of  records  per  cluster.  The  third  part 
associates  the  descriptor  attribute  to  the  record  class.  LARGE  (LG).  MEDIUM 
LARGE  (MEDLG),  .  MEDIUM  (MED).  OR  SMALL  (SMALL).  All 
together,  there  are  eight  possible  attribute  names: 


INTONELG 


INTTWOLG 


INTONEMED  LG  INTTWOMEDLG 


INTONEMED  INTTWOMED 


INTONESMALL  INTTWOSMALL 


(KM 
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Nine  sets  of  values  are  assigned  to  the  INTON'Exxx  descriptor  attributes,  each  set 
representing  the  range  values  associated  with  a  specific  cluster  category.  The  set 
of  values  assigned  to  the  INTWOxxx  descriptor  attributes  are  determined  by  the 
maximum  range  value  found  in  the  corresponding  INTONExxx  attribute,  and 
the  number  of  records  per  cluster  for  that  specific  cluster  category.  For  example, 
given  a  database  that  has  as  part  of  its  record  configuration  9372  2000-bvte 
records,  table  11  reflects  the  range  values  for  each  of  the  descriptor  attributes, 
c.  The  Record  File 

Two  distinct  classes  of  record  files  are  generated  by  the  CAD  system. 
The  first  class  of  record  fibs  represents  the  response-time-reduction  (rtr) 
configurations.  Each  database  (small,  medium,  and  large)  has  its  own  unique  rtr 
record  file.  The  rtr  file  is  associated  with  all  of  the  rtr  configurations  of  a  given 
database  because  the  size  of  that  database  does  not  change. 

The  second  class  of  record  files  represents  the  response- time- 
invariance  (rti)  configurations.  Each  database  has  its  own  set  of  response- time- 
invariance  record  files.  Each  rti  record  file  is  associated  with  a  specific  rti 
configuration.  Recall  from  chapter  IT  that  the  number  of  rti  configurations  is 
dependent  on  the  number  of  backends.  We  should  note  however  that  the  number 
of  rti  record  files  per  database  set  is  the  same. 

The  record  file  contains  the  information  the  user  desires  to  have 
stored  in  the  database.  The  information  the  CAD  system  places  in  each  record 


takes  five  distinct  forms.  Each  form  is  associated  with  a  specific  type  of  attribute. 


table  ii 


INTONELG  ATTRIBUTE  VALUE  RANGES. 
INTTWOLG  ATTRIBUTE  VALUE  RANGES. 

INTONELG  INTTWOLG 
Range  Range 

of  V  alues  of  V  aluet 


; 867;  1,562!  1867:8741 

1875;  882] 

il, 555:1,562' 


1.563:2,432]  :i,563;1.572| 

,1,573;1, 582| 


’2,423:2.432| 


2.433:3,476]  12.433:2.444) 

2.445:2.456] 


]3.465;3.476] 


INTONELG  ATTRIBUTE  VALUE  RANGES 
INTTWOLG  ATTRIBUTE  VALUE  RANGES. 


INTONELG 

Range 
of  Values 

INTTWOLG 
Range 
of  Values 

i4.695;6,086j 

14.695:4.710] 
j4. 711;4.726] 

]6. 071-.6. 086j 

j6.087:7.652] 

16.087:6.1041 

|6.105:6.122j 

]7. 635:7.652) 

17,653:9,372] 

j7. 653:7. 672) 
[7.673:7.692] 

19,353:9.372] 

Recall  that  the  attribute-based  data  model  considers  data  in  terms  of  several 
constructs,  two  of  which  are  directory  keywords,  and  non-directory  keywords.  The 
first  three  forms  of  information  are  the  values  of  the  template  attribute  and  the 
two  descriptor  attributes.  These  attributes  are  directory  keywords.  The  other 
attributes  of  the  record  are  non-directory  keywords. 

The  first  of  the  non-directory  keyword  attributes  and  the  fourth  form 
of  information  is  described  by  an  attribute  entitled  Multiple.  The  values  of  the 
multiple  attribute  are  character  strings  representing  how  many  times  the  database 
size  has  been  multiplied,  e.g..  "One".  "Two".  "Three",  .  .  .,  etc.  For  example,  if 
the  database  size  has  been  doubled,  the  value  of  the  multiple  attribute  is  "Two". 
The  effect  of  the  multiple  attribute  is  to  double,  triple,  etc.,  the  size  of  all  existing 
clusters  without  defining  any  new  clusters.  Further  detailed  discussion  of  the 
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clustering  concept  is  found  in  [Ref.  2..  In  the  response-time  reduction  record  file, 
the  multiple  attribute  always  has  a  value  of  "One".  The  response-time-invariance 
record  file  multiple  attribute's  value  ranges  from  "Two"  to  the  character  string 
representing  the  number  of  times  the  database  size  increases,  e.g..  for  a  system 
with  2  backends,  the  multiple  attribute  is  assigned  the  values  "One"  and  "Two". 
The  database  is  doubled  by  duplicating  the  original  record  file  and  appending  the 
duplicated  file  to  the  original  file  thus  creating  a  new  record  file  twice  the  size  of 
the  original.  However,  in  the  duplicated  file  the  multiple  attribute’s  value  is 
changed  from  "One"  to  "Two",  thus  every  record  in  the  new  database  is  unique. 
Table  2  in  chapter  II  characterizes  this  assignment. 

The  fifth  form  of  information  is  found  in  the  attribute  entitled 
String.  The  purpose  of  the  string  attribute  is  to  represent  sufficiently  large 
nonprocessing  data  in  order  to  make  up  the  size  of  the  record  classes.  All 
subsequent  attributes  are  string  attributes  as  well.  The  number  of  string 
attributes  is  dependent  on  the  record  sizes.  Recall  from  Strawser's  scheme  [Ref. 
2],  four  record  classes  are  defined  as  large,  medium- large,  medium,  and  small. 
While  the  large  record  size  is  a  function  of  the  disk  track  size  of  the  system,  the 
remaining  three  sizes  are  multiples  of  the  largest  record  size.  The  ratios  are  1,  1/2. 
1/5.  and  1/10. 

The  number  of  string  attributes  is  dependent  upon  the  size  of  the 
respective  record  class.  Because  MBDS  requires  all  attributes  in  a  record  to  be  the 
same  size,  a  common  divisor  to  all  four  record  classes  is  selected.  The  CAD  system 
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meets  this  requirement  by  soliciting  the  disk  track  size  in  increments  of  1000.  The 


increments  of  1000  can  best  represent  actual  disk  track  sizes  which  are  normally 
in  increments  of  kilo  bytes,  where  a  kilo  is  actually  1024  bytes.  In  receiving  the 
disk  track  size  in  increments  of  1000.  a  common  divisor  of  10  ensures  the  number 
of  attributes  for  each  record  size  conforms  to  the  record  class  ratios.  For  example, 
a  system  configured  with  a  4000  byte  disk  track  size  has  4  record  sizes  of  2000 
bytes,  1000  bytes.  400  bytes,  and  200  bytes.  By  selecting  a  common  divisor  of  10, 
we  set  the  attribute  size  to  10-bytes  per  attribute.  Table  12  show  the  number  of 
10-byte  attributes  corresponding  to  each  record  class. 


TABLE  12.  NUMBER  OF  10-BYTE  ATTRIBUTES  PER  RECORD  CLASS. 


Record  Size 
in  Bytes 

Number  of 
Attributes  j 

2000 

200 

1000 

100 

400 

40 

200 

20 

1 

A  2000  byte  record  has  200  attributes  of  which  196  are  string  attributes:  a  200 
byte  record  may  have  20  attributes  of  which  16  are  string  attributes.  Each  string 
attribute's  value  is  a  string  of  x's.  i.e..  "Xxxxxxxxx".  When  the  database  is 
queried  by  the  generated  transaction  mixes,  the  strings'  values  take  on  greater 
significance.  In  the  transaction  process  the  values  are  changed  to  more  meaningful 


textual  information.  This  process  is  explained  later.  Tables  13  and  14  depict  the 
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RECORD  LAYOUT  FOOTNOTES 


The  Attribute  Template's  value  is  a  character  string  representing 
one  of  four  record  sizes,  large,  medium-large,  medium,  and  small.  In  this 
example  the  record  sizes  represented  are  2000,  1000,  400.  and  200  byte  records. 

‘  The  names  INTONExx  and  INTTWOxx  represent  the  descriptor  attributes. 

The  Attribute  Multiple  reflects  the  size  of  the  database,  i.e.. 

One  for  size  N.  Two  for  2N,  Three  for  3\.  etc.  A  one-to-one  mapping  exists 
between  the  descriptor-id  set  and  the  value  of  this  attribute.  The  Multiple 
Attribute  is  used  to  double,  triple,  etc.,  the  size  of  all  existing  clusters, 
without  defining  any  new  clusters. 

*  The  number  of  string  attributes  is  dependent  upon  the  size  of  the 
respective  record  class,  i.e..  the  2000-bvte  record  may  have  200  attributes 
of  which  196  are  string  attributes:  the  200-bvte  record  may  have  20  attributes 
of  which  16  are  string  attributes. 


response-time  reduction  record  layout  for  a  small  database  for  a  system  featuring 
300  Megabytes  of  available  disk  storage  with  a  4000  byte  disk  track  size, 
d.  The  Generated  Test-Database  Report  Files 

Within  the  test-database-generation  component  two  report  files  are 
created  concurrently  as  two  relationships  are  calculated.  The  first  involves  the 
records-per-block  relationship.  Recall  from  chapters  I  and  II  that  the  disk  track 
size  is  the  block  size  by  which  MBDS  processes  information  from  secondary’ 
memory  on  the  backend  to  primary'  memory  on  the  controller.  Given  the 
requirement  to  pass  four  different  record  sizes  via  the  block,  a  records-per-block 
relationship  is  recognized.  Table  15  depicts  the  relationship  for  a  disk  track  size  of 


4000  bytes. 


The  second  relationship  involves  record-per-ciuster-category.  This 

relationship  is  much  like  the  first  but  a  bit  more'complex.  Recall  from  the  section 

TABLE  14 


LOGICAL  LAYOUT  OF  A 
RESPONSE  TIME  REDUCTION  RECORD  FILE 

(small  database) 

(4000  byte  disk  track  size) 

(300  Mbytes  of  available  disk  storage) 


Directory  Keywords 


1 

One 

TEMPLG 

2 

2 

One 

Non-Directory  Keywords 


S 


Xxxxxxxxx 


r  r  i 

%  *s 


TABLE  15.  THE  RECORDS-PER-BLOCK  RELATIONSHIP 


describing  the  CAD  characteristics  the  need  for  nine  cluster  categories.  This 
relationship  describes  the  number  of  records  per  cluster  category  recognizing  that 
each  cluster  category  has  a  different  number  of  blocks.  Table  16  depicts  this 
relationship. 

TABLE  16.  THE  NUMBER  OF  RECORDS  PER  CLUSTER  CATEGORY  TABLE 


2.  The  Generated  Test-Database  Program  Modules 

This  section  discusses  the  CAD  program  structure  for  the  generation  of 
the  test  databases.  Recall  from  the  section  on  Generating  the  Test-Database  Files, 
that  three  high  level  modules  are  invoked  to  generate  the  test-database  files: 

•  Make_template_file. 

•  Make  descriptor  files,  and 
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TABLE  15  THE  RECORDS-PER-BLOCK  RELATIONSHIP 


Record  Sizes 
in  Bytes 

Records 
per  Block 

2000 

2 

1000 

4 

400 

10 

200 

20 

describing  the  CAD  characteristics  the  need  for  nine  cluster  categories.  This 


relationship  describes  the  number  of  records  per  cluster  category  recognizing  that 


each  cluster  category  has  a  different  number  of  blocks.  Table  16  depicts  this 


relationship. 


TABLE  16  THE  NUMBER  OF  RECORDS  PER  CLUSTER  CATEGORY  TABLE 


Blocks 

Record  Sifce  in 

Bytes: 

Cluster: 

2000 

1000 

400 

200 

O 

4 

8 

20 

40 

*> 

u 

6 

12 

30 

60 

4 

8 

16 

40 

80 

0 

10 

20 

50 

100 

6 

12 

24 

60 

120 

7 

14 

28 

70 

140 

8 

16 

32 

80 

160 

9 

18 

36 

90 

180 

10 

20 

40 

100 

200 

2.  The  Generated  Test-Database  Program  Modules 

This  section  discusses  the  CAD  program  structure  for  the  generation  of 
the  test  databases.  Recall  from  the  section  on  Generating  the  Test-Database  Files, 
that  three  high  level  modules  are  invoked  to  generate  the  test-database  files: 

•  Maketemplate  file. 

•  Make  descriptor  files,  and 


fifi 


•  Make_template_file. 

•  Make_descriptor_files.  and 

•  Make_record_files. 

Each  module  is  tasked  to  generate  one  of  the  three  types  of  files  germane  to  the 
test-database  component. 

a.  The  Generate-Template-File  Module 

The  generate-template-file  module  creates  one  file  containing  four 
templates,  one  template  for  each  record  class.  For  each  template  the  module 
writes  to  the  file  all  the  attributes'  names:  the  template  name,  the  descriptor 
names,  the  name  multiple  for  the  multiple  attribute,  and  the  strings'  names.  To 
determine  the  appropriate  number  of  string  attributes,  the  module  accesses  the 
number-of-attributes-per- record-size  array  (see  Table  12). 

b.  The  Generate-Descriptor-Files  Module 

The  generate-descriptor-files  module  invokes  five  subordinate 
modules.  Each  of  these  modules  is  a  procedure  that  is  a  prerequisite  to  the 
following  modules  with  the  exception  of  the  Backendtable  module.  The  five 
subordinate  modules  are: 

•  the  Create  records  per  block  relationship  table  module. 

•  the  Create  the  record  per  cluster  category  table  module, 

•  the  Create  the  base  record  block  distribution  table  module. 

•  the  Backend  table  module,  and 

•  the  Write  descriptor  files  module. 

We  discuss  each  of  these  modules  in  turn. 
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(ll  The  Records- Per- Block- Relationship  Module.  This  module 
accesses  the  disk  track  size  variable  and  the  four  record  sizes  array  to  calculate  the 
relationships.  A  2x2  array  stores  the  relationship  between  the  records  per  block 
and  the  record  size  themselves.  For  each  record  size,  the  number  of  records  per 
block  is  calculated  by  dividing  the  disk  track  size  by  that  record  size. 

(2)  The  Records-Per-Cluster-Category  Module.  This  module  accesses 
the  records-per-block-relationship  array  and  another  array  that  stores  the  number 
of  blocks  per  cluster  category  information  to  create  a  records-per-cluster-categorv 
array.  For  each  cluster  category,  the  number  of  bytes  per  record  size  is  calculated 
by  multiplying  the  records  per  block  by  the  number  of  blocks  per  cluster. 

(3)  The  Base  Record-Block-Distribution  Module.  Table  7  depicts  the 
logical  layout  of  the  base-record-<L*-block-distribution  table  (base  table).  To 
complete  this  table  three  data  elements  must  be  accessed:  the  records-per-c luster- 
category-table  array,  the  key  number  of  records  variable,  and  the  number  of 
backends  variable  (num  be).  Columns  1.  2.  and  3  of  the  base  table  are  a 
duplicate  copy  of  the  information  stored  in  the  largest  record  size  of  the  records- 
per-c  luster-c  ategorv-t  able  array . 

Column  4.  the  total  number  of  clusters  per  cluster  category, 
involves  a  number  of  calculations.  The  first  calculation  involves  summing  the 
entries  in  column  3  of  the  base  table.  This  value  is  the  number  of  records  per 
cluster  category  for  all  nine  categories.  The  abbreviated  variable  name 
ttl  reds  distr  has  been  given  to  this  variable  for  denoting  the  total  number  of 


records  for  distribution.  This  value  is  then  divided  into  the  key  number  of 
records.  If  the  division  is  without  a  remainder,  then  the  total  number  of  clusters 
(cluster  size)  for  each  cluster  category  is  the  same.  Otherwise,  the  remainder  is 
truncated  and  1  cluster  is  added  to  the  cluster  size.  When  a  remainder  exists,  a 
record  overflow  is  incurred  by  the  addition  of  the  single  cluster.  The  record 
overflow  is  calculated  and  then  divided  by  the  average  number  of  records  per 
cluster.  This  variable  is  named  clus_id.  This  new  value  identifies  the  number  of 
cluster  categories  that  now  must  be  decremented  by  a  single  cluster  to  insure  the 
distribution  of  all  records  (key  number  of  records).  The  cluster  categories  selected 
to  be  decremented  are  evenly  distributed  across  all  9  categories. 

Column  5.  the  total  number  of  records  per  cluster  category,  is 
calculated  by  multiplying  each  row  entry  of  column  2  by  the  corresponding  row 
entry  in  column  3.  Column  6,  the  total  number  of  blocks  per  cluster  category,  is 
calculated  by  multiplying  each  row  entry  in  column  2  by  the  corresponding  row- 
entry  in  column  3. 

Column  7.  the  number  of  blocks  per  backend,  is  dynamic  in 
nature  and  is  dependent  on  the  number  of  backends  as  well  as  directly  associated 
with  the  configuration  number.  A  subordinate  module  is  invoked  to  calculate  the 
appropriate  values  for  a  given  case.  For  each  entry  in  column  7,  given  the 
configuration,  the  value  is  equal  to  the  total  numbers  of  blocks  per  cluster 
category  (column  6)  divided  by  the  number  of  backends  in  use. 


(4)  The  Backend-Table  Module.  The  backend-table  module  writes  a 


report  file  that  depicts  the  record-&-bIock-distribution  table  for  each  record  class 
(large,  medium-large,  medium  and  small)  within  each  database  (small,  medium, 
large).  The  base  record-^  -lock-distribution  table  is  accessed  and  those  columns 
which  are  dependent  on  the  record  factor,  database  factor,  or  both  factors  are 
multiplied  by  the  appropriate  factor  with  respect  the  record  class  and  the 
database. 

(5)  The  Write-Descriptor-Files  Module.  The  write-descriptor-files 
module  writes  a  descriptor  file  for  each  of  the  three  databases.  The  module 
accesses  the  template  names  in  the  t_name  array  and  writes  them  to  the  file.  The 
module  then  accesses  the  total  number  of  records  per  cluster  category  column 
(column  5)  of  the  base  table  to  determine  the  set  of  values  for  each  of  the  8 
descriptor  attributes.  The  module  uses  the  mathmatical  formula  devised  by 
Vincent  [refV.plOl]  and  the  database  and  record  factors  presented  as 
characteristics  of  the  CAD  to  calculate  the  values, 
c.  The  Generate- Record-Files  Module 

Each  database  has  its  own  set  of  record  files.  The  generate- record-files 
module  creates  a  dynamic  number  of  record  files  for  each  database  dependent  on 
the  number  of  backends.  Recall  from  the  discussion  of  the  record  file  that  two 


distinct  classes  of  record  files  are  generated  by  the  CAD  system.  One  set  is  the  rtr 
configuration  records  which  comprises  a  single  file.  For  each  of  the  rti 
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configurations,  a  separate  file  is  generated  and  whose  size  is  a  multiple  of  the  rtr 
file.  The  generation  of  the  record  files  is  accomplished  as  in  two  steps. 

In  the  first  step,  the  initial  file  to  be  opened  is  the  record  file 
representing  the  response-time-reduction  configurations.  The  size  of  this  particular 
file  is  the  base  size  for  all  the  multiple  size  rti  record  files.  For  each  of  the  record 
classes  the  appropriate  number  of  records  is  written.  First,  the  values  for  the 
descriptor  attributes  are  written  into  the  record  and  then  the  value  "One"  is 
written  for  the  multiple  attribute.  For  each  of  the  number  of  string  attributes 
(which  is  different  for  each  record  size)  the  value  "Xxxxxxxxx"  is  stored.  The 
total  number  of  records  to  write  is  determined  by  summing  all  of  the  entries  in 
the  total-number-of-records-per-cluster-category  column  (column  5)  and 
multiplying  the  sum  by  the  record  and  database  factors.  Once  this  evolution  has 
been  completed  for  each  record  class,  the  file  is  closed. 

In  the  second  step,  before  a  new  file  is  opened,  a  copy  of  the  last  file 
written  is  made.  Then  a  new  file  is  opened  and  appended  to  the  copy.  For  each  of 
the  response-time-invariance  configurations,  the  multiple  attribute  value  is 
increased  by  one.  This  evolution  terminates  when  the  requisite  number  of  record 
files  have  been  written.  The  entire  process  encompasses  both  the  rtr  file  and  the 
appropriate  number  of  rti  files.  The  two  steps  are  performed  for  each  database 
(large,  medium,  and  small). 


C.  THE  TEST-TRAXS ACTION-MIX-GENERATION  COMPONENT 

The  objective  of  the  transaction-mix  component  is  to  pass  to  the  TEST 
directory  all  the  transaction-mix  files  that  represent  the  four  record  classes.  Each 
of  the  three  database  sizes  has  four  transaction-mix  files,  one  for  each  record  class. 
The  transaction-mix  component  also  generates  a  number  of  report  files  that 
portray  information  concerning  the  mixes  themselves. 


1.  The  Generated  Test-Transaction-Mix  Files 

The  CAD  system  generates  three  database  sets,  a  large  database,  a 
medium  database,  and  a  small  database.  For  each  database  three  sets  of  files  are 
generated.  The  first  set  of  four  files  is  the  transaction-mix  files,  one  for  each  record 
class.  The  second  set  (four  files,  one  for  each  record  class)  is  a  printout  of  the 
transaction-mix  report  file.  The  third  set  (four  files,  one  for  each  record  class)  is  a 
printout  of  the  test-transaction-mix-workload  file.  The  second  and  third  sets  are 
used  as  an  integral  part  of  the  evaluator's  report.  As  each  transaction  request  is 
generated  for  a  specific  record  class  within  a  specific  database,  the  transaction-mix 
file,  the  transaction-mix  report  file,  and  the  transact  ion- mix- work  load  file  is 
written  concurrently. 

Appendix  B  includes  the  test-transaction-mix  report  and  workload  for  the 
large  record  class  of  the  small  database  for  a  system  with  3  backends,  a  4000  disk 
track  size,  and  375  Mbyte  disk  storage  capacity.  We  encourage  the  reader  to  refer 


to  the  appendix  as  the  remainder  of  the  section  is  being  read. 


a.  The  Transaction-Mix  Files 


Vincent  (Ref.  2:p.  104-120)  has  identified  30  transaction  requests  that 
encompass  the  five  types  of  transactions  found  in  MBDS,  the  retrieval,  the 
insert,  the  update,  the  delete,  and,  the  retrieve-common.  The  30 
transactions  are  further  grouped  into  7  request  sets,  each  set  designed  to  test  the 
system  with  overhead- intensive  or  data-intensive  type  of  operations.  The  30 
transactions  axe  created  for  each  record  class  within  a  database  and  comprise  a 
separate  file.  All  together,  twelve  transaction  files  are  written. 

The  CAD  system  presently  calculates  only  the  first  24  transactions  (6 
request  sets)  of  the  30  suggested  by  Vincent.  These  first  24  are  the  most 
important  with  regard  to  the  system  testing.  The  remaining  6  transactions  (the 
7th  request  set)  are  to  be  implemented  at  a  later  date. 

b.  The  Generated  Test-Transaction-Mix-Report  Files 

Two  types  of  report  files  are  generated  within  the  test-transaction- 
mix  component.  The  first  type  of  report  prints  the  actual  transaction  request  sets 
with  respect  to  each  record  class  within  each  database.  All  together,  twelve 
transaction  mix  files  are  generated. 

The  second  type  of  report  files  generated  in  this  component  are  the 
workload  statistics  for  each  of  the  requests  in  each  of  the  request  sets.  Again  (with 
respect  to  the  record  class  and  the  database)  twelve  workload  files  are  created. 
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The  Test-Transaction-Mix-Program  Modules  immediately  follow  the 
Test-Database  Program  Modules.  Within  the  test-transaction  mix  component 
there  is  only  one  high-level  module, 

•  Generatetransmixes. 

Within  this  high-level  module,  the  subordinate  Generate-database-transaction- 
mixes  module  is  invoked  for  each  of  the  three  databases,  small,  medium,  and 
large. 

a.  The  DB-Test-Transaction-Mix  Module 

For  each  of  the  four  record  classes  this  module  opens  three  files  (the 
transaction-mix  file  and  the  two  report  files)  and  invokes  6  subordinate  modules. 
Each  of  the  six  subordinate  modules  generates  one  of  the  six  request  sets. 

b.  The  Generate-Request-Set  Modules 

There  are  six  generate-request-set  modules.  Each  module  represents 
one  of  the  five  types  of  transactions.  Each  transaction  request  is  designed  to  test 
the  system  in  terms  of  overhead  or  data  intensive  operations.  For  each  request 
within  each  request  set,  the  template  and  descriptor  names  are  identified  and  the 
range  values  appropriate  for  that  specific  request  are  calculated.  Instrumental  to 
each  calculation  are  the  record  and  database  factors. 

In  determining  the  values  for  the  descriptor  attributes,  the  CAD 
system  targets  specific  cluster  categories  depending  on  the  nature  of  the  specific 


request.  For  example,  if  the  request  involved  25c7  of  the  database,  cluster  category 
4  would  be  the  target.  Within  cluster  category  4  the  record  range  value  for  25Sx  of 
the  database  would  be  found. 

Each  request  within  a  given  request  set  is  generated  by  a  separate 
subordinate  module.  Within  each  of  these  subordinate  modules,  the  test- 
transaction-mix  report  and  the  test-transaction-mix  workload  files  are  written. 
The  same  names  and  values  calculated  for  the  transaction  mix  are  written  to  the 
test-transaction-mix-report  file.  For  Request  sets  1,  2,  3,  5,  and  6  the  workload  of 
the  request  is  described  in  terms  of  the  number  of  clusters  examined,  the  volume 
of  the  database  accessed,  and  the  volume  of  the  database  transacted.  The 
workload  for  Request  4  is  unique  and  will  be  described  later. 

(1)  Generating  Request  Set  1.  Request  set  1  is  comprised  of  three 
retrieval  requests.  Request  #1  is  overhead-intensive  designed  to  retrieve  a  very 
small  selection  of  records.  Request  #1  accesses  the  first  cluster  category  and 
retrieves  1  cluster  of  records  per  each  backend.  Request  #2  is  data-intensive 
designed  to  access  2  cluster  categories  to  retrieve  a  small  selection  of  records. 
Request  #3  is  data-intensive  designed  to  retrieve  25%  of  the  database  records 
while  accessing  just  a  fraction  more  than  25%  of  the  database. 

(2)  Generating  Request  Set  2.  Request  set  2  is  comprised  of  three 
update  requests,  all  of  which  are  data  intensive.  Request  #4  identifies  1/8  of  the 
database  and  updates  STRINGOOl’s  value  from  "Xxxxxxxxx"  to  "Oneeighth". 
Request  #5  updates  1/4  of  the  database  and  updates  STRINGS005's  value  from 


"Xxxxxxxxx"  to  "Onequartr".  Request  -G  identifies  1/2  of  the  database  and 
updates  STRINGOlO's  value  from  "Xxxxxxxxx"  to  "Onehalf". 

(3)  Generating  Request  Set  3.  Request  set  3  is  comprised  of  five 
retrieval  requests,  all  of  which  are  data  intensive.  Request  #7  identifies  the 
descriptor  range  value  that  is  exactly  1/8  of  the  database  and  retrieves  the  records 
whose  STRINGOOl’s  value  is  "Oneeighth". 

Requests  #8,  #9,  and  #10  retrieve  portions  of  the  database  that 
are  1/8,  1/4,  and  1/2  of  the  database,  respectively.  They  identify  the  appropriate 
String  attributes  that  were  updated  in  Request  Set  2.  What  they  do  not  do  is 
identify  the  descriptor  range  value  that  is  exactly  1/8.  1/4.  and  1/2  of  the 
database,  thus  every  cluster  category  (100%)  of  the  database  is  examined! 
Request  #11  identifies  the  descriptor  range  value  for  1/2  of  the  database  whose 
STRINGOlO's  value  is  "Onehalf". 

(4)  Generating  Request  Set  4.  Request  Set  4  is  invoked  just  3  times 
because  it  performs  a  retrieve-common  operation  of  two  record  classes,  i.e..  of  the 
large  and  medium-large,  of  the  medium-large  and  medium,  and  of  the  medium 
and  small.  Request  set  4  is  comprised  of  three  requests.  Request  #12  is 
overhead-intensive.  Request  #13  is  data-intensive,  and  Request  #13  is  both 
overhead  and  data-intensive. 

Request  #12  uses  the  same  values  from  Request  #1  for  its  first 
retrieve.  Its  second  retrieve  value  is  a  multiple  of  its  first.  Request  #13  retrieves 


one  half  of  the  database  from  each  record  class.  The  STRING010  attribute's  value 


"Onehalf'  is  the  determinant.  Request  =14  targets  1/S  of  the  database  for  both 
of  its  retrieve  operations.  Request  #14  uses  the  same  values  from  Request  =7  for 
its  first  retrieve  and  makes  the  appropriate  calculations  for  the  second  retrieve. 
Because  no  string  attribute  is  targeted,  descriptor  attribute  values  are  the 
determinants. 

For  each  of  the  retrieve  operations  (the  first  retrieve  being  the 
source  request  and  the  second  retrieve  being  the  target  request),  the  workload  is 
described  in  terms  of  the  number  of  clusters  examined,  the  number  of  records 
accessed,  and  the  number  of  relevant  records  to  the  retrieve-common  request.  For 
the  retrieve-common  request  as  an  entire  entity,  the  number  of  records  retrieved 
resulting  from  the  request  is  also  depicted  in  the  workload.  All  the  workload  data 
elements  are  written  in  table  format  as  a  part  of  the  file. 

(5)  Generating  Request  Set  5,  Request  Set  5  is  comprised  of  two 
insert  requests.  Request  *  15  inserts  a  record  into  an  existing  cluster.  A  record 
from  the  present  record  class  is  duplicated  with  the  exception  of  the  multiple 
attribute.  The  new  value  placed  in  the  multiple  attribute  is  one  more  than  the 
maximum  possible  value  with  respect  to  the  number  of  backends.  E.g..  if  a  system 
has  had  3  backends  whose  last  response-time  invariance  configuration  has  a 
multiple  attribute  value  of  "Three",  the  new  value  would  be  "Four". 

Request  #16  inserts  a  record  into  a  new  cluster.  A  record  from 
the  first  cluster  category  is  duplicated  with  the  exception  of  the  INTWOxxx 
descriptor  attribute  (which  identifies  the  number  of  records  per  cluster).  Bv 


determining  the  maximum  range  vaiue  for  this  descriptor  in  this  specific  cluster 


category,  a  new  value  larger  than  the  maximum  range  value  is  selected  as  the 
appropriate  value.  By  defining  a  new  descriptor  whose  value  is  not  within  the 
range  limits  of  those  clusters,  a  new  cluster  is  created  for  that  record  to  be 
inserted. 

(6)  Generating  Request  Set  6.  Request  Set  6  is  comprised  of  eight 
delete  requests.  Request  #17  is  overhead-intensive.  Request  #18  is  both  data  and 
overhead- intensive,  and  Requests  #19  through  #24  are  data-intensive. 

Request  #17  and  #18  mirror  retrieve  Request  #1  and  #2, 
respectively.  Request  #20  -  #24  mirror  retrieve  Request  #7  -  #11,  respectively.  In 
all  of  these  requests,  the  workload  is  exactly  the  same  as  the  mirrored  request. 
Request  #19  corresponds  to  Request  #3’s  workload.  However  the  last  25%  of  the 
database  is  deleted.  Again  the  target  cluster  category  is  identified  (category  8) 
and  the  appropriate  value  is  calculated. 

D.  THE  REPORT  COMPONENT 

The  evaluator's  report  component  provides  the  user  with  a  detailed 
explanation  of  the  benchmarking  process.  The  report  is  comprised  of  standard 
text  files  interleaved  with  a  number  of  tables  encapsulating  data  and  statistical 
information  about  the  system  defined  by  the  user’s  input.  Given  the  three 
essential  elements  of  information  solicited  from  the  user,  the  CAD  system 
produces  a  report  that  provides  information  describing  four  main  topics:  the 
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generated  test  databases,  the  generated  test-transaction  mixes,  the  TEST 
directory  and  all  the  files  contained  therein,  and  the  instructions  on  how  to 
integrate  the  CAD  generated  test  files  with  MBDS. 

The  reports  describing  the  generated  test  databases  tore: 

•  the  test-configurations  report  for  each  of  the  given  databases. 

•  the  record-block-relationship  report  for  the  base  table, 

•  the  records-per-cluster-category-relationship  report  for  the  base  table,  and 

•  the  record-&-block-distribution-table  report  for  each  record  class  for  each 

given  database. 

The  reports  describing  the  generated  test-transaction  mixes  are: 

•  the  transaction-mix  report  for  each  record  class  within  a  given  database,  and 

•  th  transaction-mix  workload  for  each  record  class  within  a  given  database. 

Appendix  B  includes  examples  of  all  the  generated  report  files. 

The  standard  text  files  are  presently  being  drafted.  Once  drafted  and  placed 
within  the  CAD  system,  calls  are  to  be  coded  to  interleave  the  text  files  with  the 
generated  report  files.  The  purpose  of  the  text  files  is  two-fold.  The  first  objective 
is  to  explain  exactly  the  nature  of  the  generated  test  report  files  found  within  the 
TEST  directory.  The  second  objective  is  to  explain  interfacing  the  generated  test 
files  with  MBDS. 

Most  of  the  CAD  report  files  are  generated  concurrently  with  the  test  files 
within  the  other  two  major  components.  However,  the  CAD  system  does  call  one 
high-level  module  dedicated  to  generating  three  specific  sets  of  tables,  each  table 


being  a  unique  report  file.  Each  table  (or  file i  depicts  the  test  configurations  tor  a 
given  database,  small,  medium,  or  large. 

For  each  database  size,  the  module  formats  the  response- time- reduct  ion 
configurations  followed  by  the  response-time-invariance  configurations.  Several 
data  elements  must  be  accessed:  the  four-record-sizes  array  (four  red  sz).  the 
database-sizes  array  (dbase  sz),  the  number  of  backends  (num  be),  and  the 
number  of  configurations  (num  config).  For  the  rtr  configurations  the  database 
size  (in  Mbytes)  is  accessed  from  the  database-sizes  array.  Incrementing  through 
each  configuration  number  with  respect  to  the  number  of  backends,  the  number  of 
records  per  backend  is  calculated  as  well  as  the  Mbytes  per  backend  for  each 
record  class.  The  data  is  written  to  a  file  in  table  format. 

For  the  rti  configurations  the  same  algorithm  applies  with  the  exception  of 
increasing  the  database  size  (double,  triple,  etc.)  and  with  respect  to  the 


configuration  number. 


IV.  CONCLUSIONS 


Database  management  systems  have  taken  three  approaches  to  information 
processing,  the  traditional  mainframe-based  approach,  the  software  single-backend 
system,  and  the  software  multiple-backend  system.  The  performance  and  upgrade 
problems  identified  with  the  first  two  approaches  are  overcome  by  the  third 
approach,  the  software  multiple-backend  system,  by  providing  gains  through 
specialization  of  the  database  operations  on  dedicated,  multiple  backends. 

Two  goals  of  the  software  multi-backend  database  system  are  to  overcome  the 
performance  problems  and  upgrade  issues  of  the  traditional  mainframe-based  and 
the  conventional  software  single-backend  database  systems.  The  first  goal  is  to 
produce  a  reciprocal  decrease  in  the  response  times  of  the  user  transactions  by 
increasing  the  number  of  backends  while  the  size  of  the  database  and  the  size  of 
the  responses  to  the  transactions  remain  constant.  The  second  goal  is  to  produce 
invariant  response  times  for  the  user  transactions  by  increasing  the  number  of 
backends  proportionally  to  the  increase  of  the  transaction  responses.  The  first 
goal  allows  the  multiplicity  of  the  backends  of  the  database  system  to  be  directly 
related  to  the  performance  gains  of  the  database  system  in  terms  of  the  response¬ 
time  reduction.  The  second  goal  enables  the  multiplicity  of  the  backends  of  the 


system  to  be  directly  related  to  the  capacity  growth  of  the  system  in  terms  of 
response-time  invariance. 

To  verify  the  aforementioned  performance  and  growth-capacity  claims. 
Vincent  [Ref.  2]  has  formulated  a  benchmarking  methodology  for  software 
multiple-backend  database  systems.  In  this  thesis,  we  have  presented  a 
computer-aided  design  (CAD)  system  for  the  generation  of  test  databases  and 
test-transaction  mixes  that  can  be  used  for  the  purpose  of  benchmarking  parallel, 
multiple-backend  computer  systems,  specifically  the  Multiple-Backend  Database 
System  (MBDS). 

To  fully  understand  the  implementation  of  the  CAD  system,  we  have 
reviewed  the  essence  of  Vincent's  methodology,  specifically  the  test-databases  and 
the  test-transaction-mix  design  factors.  We  have  also  described  the  prototyped 
multi-backend  database  system  at  the  Laboratory  for  Database  Systems  Research. 
Naval  Postgraduate  School,  Monterey.  California,  including  a  discussion  of  the 
attribute-based  data  model  and  the  attribute-based  data  language. 

The  most  salient  features  of  the  CAD  system  are  two  characteristics  that  are 
an  integral  part  of  each  of  the  three  major  components.  The  first  characteristic 
involves  the  creation  of  a  number  of  factors  that  serve  as  multipliers  for  entities. 
Specifically  there  are  two  factors,  one  for  the  database  and  one  for  the  record 
classes.  The  second  characteristic  involves  the  creation  of  a  base-record-&-block- 
distribution  table  whose  numeric  entries  refl°ct  the  upper  bounds  on  the  number 
of  records,  blocks,  and  clusters  formed  and  distributed  across  the  backends  of 


MBDS.  The  values  in  the  table  are  used  to  generate  the  appropriate  number  of 
records  needed  for  a  precalculated  database  size.  They  are  also  used  to  determine 
the  range  values  of  descriptor  files,  and  they  serve  to  determine  the  values  in  the 
transaction  mixes  as  well. 

The  CAD  system  has  been  designed  to  receive  a  minimal  amount  of 
information  concerning  a  specific  database  system  from  the  user  an  to  transform 
this  input  into  the  requisite  test  databases  and  test-transacti>  mixes.  The 
framework  of  the  CAD  system  is  built  around  three  major  compoi.  nts,  the  test- 
database-generation  component,  the  test-transaction-mix-generation  component, 
and  the  evaluator- report  component.  The  final  output  resulting  from  the 
combined  work  of  the  major  components  is  the  CAD  system's  generation  of  two 
sets  of  files  placed  in  a  TEST  directory  for  the  user.  The  first  set  of  files  is 
expressly  for  testing  MBDS.  The  second  set  of  files  comprises  a  number  of  reports 
describing  the  test  databases  and  the  test-transaction  mixes.  In  conjunction  with 
the  second  set  of  files,  the  CAD  system  interleaves  a  number  of  standard  text  files 
that  present  a  narrative  for  the  evaluator  providing  instructions  on  how  to 
interface  the  CAD  generated  test-database  and  test-transaction-mix  files  with 
MDBS.  The  text  files  also  present  a  discussion  for  interpeting  and  analyzing  the 
empirical  data  calculated  by  the  CAD  system.  The  text  files  are  presently  being 
drafted  and  are  to  be  incorporated  into  the  CAD  upon  completion. 

The  CAD  system  described  in  this  thesis  is  a  first  version.  Presently  the  versin 
generates  the  test  databases  and  test-transaction-mix  files  and  places  them  into  a 


TEST  directory.  The  system  provides  the  user  with  an  evaluator's  report  that 


includes  instructions  on  how  to  manually  load  MBDS  with  the  generated  test  files. 
The  second  version  is  to  be  integrated  with  MBDS  allowing  the  testing  process  to 
be  controlled  and  managed  by  the  CAD  system.  The  third  version  will  add 
components  to  collect  statistics  (e.g.  response  times)  for  the  different  tests  and 
calculate  statistics  (i.e..  mean  and  standard  deviation  of  tests,  response-time 
reductions  and  response-time  invariances)  that  measure  the  performance  of 
MBDS. 


84 


>  j  |.|,|  i  *»  a>,  |> .  i’ll**  jV*kt_lVi>t.kVlW'l4<tVl^V4<i 


APPENDIX  A:  THE  CAD  SYSTEM  SPECIFICATIONS 


Task:  CAD  Benchmark  System 


/*The  following  are  global  arrays 

/*  t_name  :  an  array  of  the  four  template  names,  Templg.  Tempmedlg, 
/*  Tempmed,  and  Tempsmal  1 

/*  INT  l_name:  ii  an  array  of  the  four  descriptor  ONE  names.  INFONELG, 
/*  ~  INTONEvEDLG.  INTONEMED,  and  IYTONESM\LL 

/*  INT_2  _n  ame  j  !  :  an  array  of  the  four  descriptor  TWO  names.  INTBAOLG. 

■'*  INTIWMDLG,  INTCVNCKED.  and  I!YriVSOSM\LL 

/*  N4i  1 1 i p 1 e i  j  :  an  array  of  11  mu  1 1 i p 1 e  name  s . "One" , "Two" .  .  ."Eleven" 

/*  rcd_f ac tor |  j  :  an  array  of  the  record  factors,  1,  2.  5.  and  10 
/*  dbase _f ac t or [ i :  an  array  of  the  database  factors.  1.  2,  and  4 
'*  numat tr_per_rcd_sz j :  an  array  of  the  number  of  attributes  per 
'  *  record  size  (class) 

/*  r cd_pe r_b 1 k_r e 1 _t b 1 [ r , c i :  a  2x2  array  of  the  record-block  relation 

/*  rcd_per_c lus_cat _tbl i r , c :  a  9x4  array  of  the  cluster  category  and 
>*  the  record-block  relationship 

'*  base  r  cd_&  block  distr_table 

4  1cm  table:  an  array  of  lem's  indexed  by  the  number  of  backends 


perform  Rec  e i ve  us e r  _i npu t  (  num  be ,  dsk  t  rksz  ,  maxs  tor  age  dsk  s z  )  ; 
perform  I n i t i a  1 _c a  1 cu 1  at i ons ( numbe ,  dsk  trk  st ,  max  storage  sz); 
perform  Make_Templ  at  e_Fi  1  e  s  (  )  ; 

pe r f orm  Make  Des c r i pt or _Fi 1 es ( numbe .  dskt rk  sz . key  num_rcds , 

max_dsk_storage ) ; 

pe r f orm  Make  Record _Fi 1 e s  (  num  be  .  key  num  reds); 
perform  Gene r at e  Trans  Mix  Fi 1 es ( num_be ) ; 


perform  Format _tes t  bmark  dbs(num  be.  num  config): 
perform  Clean  Up ( ) ; 


end  task: 


procedure  Receive  userinputl  output  .  num  be.  dskt  rk_sz  , 

max  storage  dsksz): 

pe r f orm  Ge t SNumbe r _o f  Backend s  (numbe): 
per  form  Get  SDi  sk  Track_Size  ( ds k _t rk_s z ) ; 
perform  Ge  t $Di  sk_S c orage _S  i  z e  (max_s t orage_dsk 


end  procedure 


module  Get$ 


procedure  Number  of  Backends  (  output  :  num  be  )  : 


'*  num_be  is  the  number  of  backends  solicited 
’  from  the  user 


end  proc edur e : 


procedure  Disk  Track  Size  (  output  :  dsk  trk  sz)  ; 


end  procedure ; 


/*  dsk_trk_sz  is  the  disk  track  size  of  the 
/*  system  solicited  from  the  user 


procedure  Disk  Storage  Size  (  output:  max  storage  dsk  sz); 

/*  max  storage  dsk  sz  is  the  maximum  size  of 
■'*  the  disk  storage  (in  bytes).  This  value  is 
/*  inputted  by  the  user 

end  procedure; 


procedure  Initial  calculations  (  i 

perform  Calculate  SConfigu  ra t ions  i numbe .  num_c  on fig )  : 

perform  Calculated  cm(  num  be .  1  cm)  : 

perform  Calculate SFour  _Record_S i  z  e  s  (dst_trk_sz); 

pe r f orm  Cal cu 1  at e$Ava i 1 _Di  sk  Storage  (max_storage  dsk  sz) 

perform  Calculate  $Da t  abase  _Mu It i p 1 e ( 1  cm) : 

per  form  Cal cu 1  at e $Dat abas e_S i z e s ( dhm,  avail  dskstorage) 

perform  Calculate  $Cal c  _num_at  tr_for_ea_rcd_sz( )  ; 

perform  Calculate  SNumat  tr  f  or  _e  a_rcd_s  z ( ) : 

perform  Ca 1 c  u  1  a t  e$Key  nurnr  cds ( ) . 


end  procedure 


ro.TJT. ,  ”  v<.v  >.  v  v  .*■  k'*v* 


module  Calculates 


function  Configurations 


‘  1 nput :  num_be . 
output :  num  config); 


end  procedure: 


num_config  =  2*(num_be)  -  1: 

/*  num_config  is  the  number  of  test  configurations 
/*  for  each  of  the  databases,  small,  medium,  and 
/*  Urge 


function  1cm  (  i  nput  :  num_be  ,  1  cm_i  n f  o_pt  r  ’ 
output :  lcmj  ; 


Access  table  and  select  the  corresponding  1cm  value 
for  the  respective  num_be 

/*  1cm  is  the  least  conmon  multiple  based  on  the 
/*  number  of  backends,  num  be 


end  proc  edure  : 


procedure  FourRecordSizes  (  i  nput  :  dsktrksz,  red  factor  !. 

output :  four_rcd_szs  ilrs.  mlrs.  mrs,  srsj) 

Determine  the  four  record  sizes  based  on  Strawser's 
s  c  h  erne 

four_rcd_sz j 0  =dsk  trk  sz  ,  2: 

for  every  other  four_rcd_sz  ij.  i  ranging  1  to  3  do 


four  red  szii'  =  four  red  sz  0  red  factorii 


end  for 


end  procedure; 


/*  the  four  record  sizes  are  stored  in  a  global  array 
/*  four _rcd _s z [ | ,  Irs  is  large  record  size. 

*  mlrs  is  the  medium- large  record  size. 

/*  mrs  is  the  medium  record  size 
/  *  srs  is  the  sma  11  record  size 
/*  red  factor!  is  a  global  array 


» 


y] 


Dec  ermine  a  conmon  divisor  10  all  four  record  sizes. 
Divide  each  of  the  record  sizes  by  the  divisor. 

The  number  10  may  be  satisfactory. 

/*  num_at t r _pe r _r cd_s z  is  a  data  construct  containing 
'  *  the  respective  number  of  attributes  for  each  of  the 
/*  record  sizes:  large,  med-lg.  medium,  small 


end  procedure ; 


function  Calc  _key_num_rcds  (  output  :  key  _num_r  c  d  s  )  ; 

key_num_rcds  =  (  dbas  e  _s  z  [  sma  1  1  _db_s  z  i  .' num  r  cd  classes) 

four  red  sz'lg  redsz 

' ‘  dbase_sz'  is  a  global  array  */ 

"  num  red  e 1  asses  is  a  global  constant  *• 

*  four  red  sz'  is  a  global  array  * 


procedure  Make  Discript  or  _files(  nun  be  .  dsktrk  sz. 

key  numrcds  max  dsk  storage. 
0£t  pu  t_.  three  descriptor  files): 


perform  Initialize  clus  cat  tabie(): 

perform  Create  record  per_block_relationship_table(dsk_trk_sz): 
perform  Cr  eate  red  per  _c luster  _c  at  _tab!e( ) : 

pe r f orm  Cr e a t e _bas e _r ecord_b 1 ock  d  i  s t r i but i on_t ab 1 e ( num  be ,  key  num  reds' 
perform  Backend _c ab I e ( num  be )  : 

perform  Write  de s c r 1 p t or _fi 1 e s ( num_be ,  k ey  numr c d s ) ; 
end  procedure : 
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procedure  Create  records_per  block  relationship  table  i  i_nput_  dsk_trk_sz. 

output  rcdper  c 1 u s _c a i _t ab 1 e  r.c  /: 

*  The  following  code  creates  the  Records  per  Block  Relationship  Table 
’  which  is  a  global  array 

’The  following  variables  are  local  * 

/*  block:  the  disk  track  size  for  memory  transfer  * 

block  =  dsk_trk_sz 

for  each  record  size,  f our _rc _sz ; i .  do 
rcd_per_blk  =  block /rcd_szi 
write  rcd_sz  ij,  rcd_per_blk 

end  f or  ; 

end  procedure ; 


procedure  Create  rcd  per  cluster  catagcry  table  (  output  : 

ml  per  t  1  us  _c  at  _l  ab  1  e  i  r  .  c  )  ; 

access  block  per  cluster  information  (CCI  array) 

and  the  record-block  relationship  array  which  are  global 

scalar  multiply  the  r ec ords  pe r  b 1 oc k _r e 1  at i ons h i p_t ab 1 e  x  (0C1  array) 
and  store  in  r cd _pe r _c 1 u s _c a t _t b 1  array 

write  the  rcc  table  to  a  file: 


end  proc  edure : 


procedure  Create  base  record  4:  block  dist  table  (  input:  num  be 

key  numr  c d s 

2iLLE!L_  base  r  c  d  &  blk  distr  table  r.c  )  : 

'*  bas  e  _r  cd_4:_b  l  k  _d  i  s  t  r  _t  ab  1  e  r,c  :s  a  global  structure 

(col  0)Record_sz_i n_bv te  column  array 
(col  1 )Numbe r _o f _b 1 ocks  per  c luster_column  array 
/’  (col  2 )Numbe r _o f _r e c ord s  per  cluster  column  array  * 

'*  (col  3 )Tot a  1 _numbe r _o f  clusters  column  array 

(col  4)Tota)  number  of  records  column  array  * 

(col  5 )Tot a  1  numbe r _o f  blocks  column  array 

(col  6)A  structure  to  another  (dynamic)  table  dependent  * 
'*  on  "m"  configurations  * 

perform  Comp  1 e  t  e  _co 1  umn_0 (base  red  &  blk  distr  _t  ab 1 e  j  r . c i  ) : 

perform  Comp  1 e  t  e  _c  o 1  umns  1  _&  2(base  red  4;  blk  distr  table: r.c  ): 

perform  Complete_column_3(base_rcd_4  blk  distr  table  r.c); 

perform  Comp  1 e  t  e  co 1  umns  _4  (  bas  e  _r  c  d blk  distr  table  r,c  ); 

perform  Comp lete  col  umns  _5 ( bas  e  red  &  blk  distr  tabler.c1): 

pe  r  f  orm  Comp  1 e  t  e  _c  o 1  umn_6 ( bas  e_r  cd_&  blk  distr  table  r.c  ): 


end  procedure : 
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procedure  Complete  colnmns  1  i:  2() 

access  red  blk  distri  table  r.c 


copy  rcc_table  r,c  (large  record  size  size  only)  to  columns  2  &  3  in 
red  &_bl k_d i s t r i _t abl e . r , c ; 

i*  writes  r/cc  table  to  base  red  &  blk  table  *, 


end  procedure; 


"  **  At  ■**  *  a.  *  . 


procedure  Complete _coluimi_3  (  Rn  pu  t__  * e  >  _num_r  c  d  s  . 

output  :  ba  s  e  _r  cd_&:_b  1  k  _d  1  s  t  r  _t  abl  e  r.c  ): 

•'*  Completes  column  3  of  the  base  table  ’ 

f*  Performs  the  cluster  distribution  for  the  nine  cluster  catagories  * 

/*  The  following  are  local  variables:  * 

/*  ttl_rcds  distr  is  the  total  records  per  cluster  summed  over  *' 

/ *  all  nine  cluster  categories  * 

/*  case_id  identifies  which  cluster  categories  are  to  be  decre-  */ 

/*  mented  by  1  cluster  to  ensure  proper  distribution  * 

/*  equal  dist  is  a  flag  indicating  whether  or  not  all  9  cluster  */ 

/*  categories  are  to  receive  the  exact  same  number  of  * / 

/*  clusters  * 

/*  cluster  size  is  the  number  of  clusters  per  cluster  category  * 

ttl_rcds  distr  =  Cal cu 1  at e _t rd ( ) : 

cluster  site  =  Calculate  c 1  us t e r _s i z e ( key _num_rcds , 1 1 1 _r cds _d i s t . 

case_id.  equal_dist); 

access  the  column  3  of  the  rcd_t_blk  distr i  table  , 

"total  number  of  clusters"  column 

base  red _&_b 1 k  _d i s  t  r  _t  ab 1 e  j  r , 3  j 

for  each  cluster  catagory  in  the  above  column,  r  ranging  from  1  to  9  do 
base  red  &  blk  distr  tableir.3  =  cluster  size: 


1 


1' 


ft?' 


* 

& 


if  equal  disi  is  false 


access  the  num_o f _r cds _pe r _c 1  us t er  column  (column  2) 
in  the  red  L  blk  distri  tableir.2 


for  case_id( i ) 

i  =  1  subtract  1  from  cluster  catagory  5,  1  to  9  catagories 
i  =  2  subtract  1  from  cluster  catagories  1  and  9 

i  =  3  subtract  1  from  cluster  catagories  1,5  and  9 

i  =  4  subtract  1  from  cluster  catagories  1,2.8  and  9 

i  =  5  subtract  1  from  cluster  catagories  1.2. 5, 8  and  9 

i  =  6  subtract  1  from  cluster  catagories  1,2. 3. 7. 8  and  9 

i  =  7  subtract  1  from  cluster  catagories  1,2. 3, 5, 7. 8  and  9 


end  procedure 


i  =  8  subtract  1  from  cluster  catagories  1 , 2 . 3 , 4 . 6 . 7 , 8  and  9 
end  case 
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*  rj-.  Tnrr. 


function  Calcnlate  trrd  () 


sum  the  total  number  of  records  in  the  large  rcord  size  column 
of  the  r cd_pe r _c 1  us _cat  table(r/cc  table):  this  value  is  the 
t t 1 _rcds _d i s t  (total  records  for  distribution) 

/*  use  for  loop,  index  from  1  to  9  and  sum  row  entries  assigning 
/*  sum  total  to  ttl  reds  dist 


v  r 


£ 


end  procedure; 

function  Calcnlate  clnster  siie  (  input  :  key  num  reds  ,  t t 1  reds _d i s t , 

case_id,  equal_dist, 
output :  c 1  us t e r _s i z e . 

equa 1 _d i s t r i but i on .  caseid); 

/*  The  following  are  local  variables  */ 


/*  redoverflow  is  the  result  of  unequal  distribution 
/*  recorddefic i t  is  the  number  of  records  short  for 
/*  even  cluster  distribution 

/*  avg  num  reds  per  cluster 


*  / 

/ 


initialize  equal _di s t r i but i on  to  false; 
numc  1  us  _c  at  =  9  ; 

cluster  size  =  (keynumreds  /  ttl _ reds  dist); 

red  overflow  =  key  num  reds  *  ( c 1  us t er_s i z e  x  1 1 1  reds  di s t ) 

if  (redoverflow  =  0) 

equa 1 _d i s t r i bu t i on  =  true; 

else 

cluster_size  =  cluster  size  —  1; 
equa Id i s t r i bu t i on  =  false; 

avg _num_r cds _pe r _c 1  us t er  =  t t I _rcds _d i s t  '  num_ 
record  deficit  =  ttl  reds  dist  -  red  overflow; 
caseid  =  record  deficit  /  avg  num  reds  per  cluster: 

/*  redoverflow  is  the  record  overflow  if  the  number  of  */ 

/*  records  cannot  be  evenly  distributed  into  an  equal  number  of  * 

/*  clusters  considering  the  original  clustersize  calculation  * 

/*avg_num_rcds _per _c 1  us  ter  is  the  average  number  of  records  per  cluster 
/‘numc luscat  is  the  number  of  cluster  categories  (global  constant) 
/*case_id  is  the  case  identifier  which  determines  which  cluster  categories 
/*  will  be  adjusted  in  size,  i.e.  decremented  by  a  single  cluster 


c lus  cat 


end  procedure; 
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procedure  Complete  col  _4  () 


access  r cd _&_bl k _d i s t r i _t ab 1 e : 

to  calculate  the  value  for  each  row  entry  in  column  4, 
multiply  the  corresponding  row  entry  from  column  2  by 
the  corresponding  row  entry  from  column  3; 

end  procedure; 


procedure  Complete_col_5{) 

access  r cd_<fc_bl  k_di  s  t  r  i  _t  ab  1  e  ; 

to  calculate  the  value  for  each  row  entry  in  column  5, 
multiply  the  corresponding  row  entry  from  column  1  by 
the  corresponding  row  entry  from  column  3; 


end  procedure: 


procedure  Complete  col _6(mun_be) 

*  The  following  variables  are  local  * 

/*  num_rt r_configs  is  the  number  of  rtr  configurations  */ 

/*  ttl_num_b!ks  is  the  row  entry,  total  number  of  blocks  */ 

/’  extra_blocks  is  the  number  of  blocks  remaining  follow-*/ 

/*  ing  even  distribution  of  total  number  of  blocks  */ 

/*  across  the  backends  *  / 

/  *  b 1 ock _d i s t r i bu t i on  is  the  block  distribution  per  */ 

/*  backend  *,' 

access  r cd_&_bl k_d i s t r i _t ab 1 e j r  ,  6  ,  : 

num  rtrconfigs  =  num_be; 

do  the  1st  configuration 

copy  col  5  to  col_6 

if  m  >  1  do 

for  configurat i ons ( i )  ranging  from  2  to  numr t r  configs  do 
create  i  columns 

for  each  row  entry, r  do  r  ranging  from  1  to  9 

/’  9  times,  once  for  each  cluster  catagory  * 

read  ttl_num_of  blks  (row  entry)  of  rcd_it_bl  k_d  i  s  t  r  i  table 

extra_blocks  =  1 1 1  nurno  f  _b  1  ks  MX)  i  ; 

b 1 ockd i s t r i bu t i on  =  ttl  num  blks  j  i: 

for  ea  be_coiumn  entry. c  do  c  ranging  from  1  to  i 


be  col  entry  r.c  =  block  distribution: 


end  for 


'>>> 


if  extra  block? 


0 


*  do  nothing  * 


else 


for  x  =  1  to  extra_blocks 

be_column  i  r  ,  y  j  =  be_column  i r , y 
v  =  y  -  1; 
i  f  y  =  i 

y  =  i; 


x  =  x  —  1  : 
end  for 


end  i  f 
end  for 


end  for 
end  procedure 
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procedure  Write_discriptor_files(  put_pup_  3  descriptor  files): 

*  t  name  is  a  global  array  containing  the  four  template  names 

■  1NT  1  name  is  a  global  array  containing  the  four  INTxxl  discriptor  names 

*  INT_2  name  is  a  global  array  containing  the  four  INTxx2  discriptor  names 

' *  red  factor  is  a  global  array  containing  the  four  record  factors 

'*  dbase  factor  is  a  global  array  containing  the  three  database  factors 

for  each  dbase_factor  idbi,  db  ranging  from  1  to  3  do 

write  to  a  file 

dat abas e _i d  name  =  TEST 
TEN-P  c  s 

for  t  nrnej j  j  ranging  from  1  to  4 

write  the  character  !,  and  inane  jj 
end  for 

t  he  char  ac  t  e  r  @ 

access  basered  &_b 1 k _d i s t r _t ab 1 e j r . c 
for  INT  1  name  i  .  i  ranging  from  1  to  4  do 
write  the  IJSTl  _name  I  i  a,  i 
’  "a"  is  the  type  of  attribute,  i  stands  for  integer  * 
fir  s  t  va 1 ue  =  1 : 
secondval ue  =  0: 

factor  =  red  factorji  *  dbase  factor  db 
for  base  r cd _&_b I k  t ab 1 e  r.4  r  ranging  from  1  to  9  do 
temp  value  =  basered  4:  blk  distr  table  r.4  *  factor 

second  value  =  second  value  —  temp  value 
write  first  value,  secondvalue 
first  value  =  second  value  -  1 
end  for 

write  the  character 


i  ranging  fr om  1  i o  4  d o 


for  INT_2_name  i 

write  the  INT_2_name  i  a.  i 

/*  "a"  is  the  type  of  attribute,  i  stands  for  integer  * 
factor  =  rcd_f  ac  t  or  j  i  !  *  dbase  factor  j  db 
w  =  0 

for  each  entry  in  bas e  t cd_&_bl k_d i s t r i _t ab 1 e  r,2  r  ranging 
from  1  to  9  do 

x  =  base_rcd_Ac  blk  distri  table  r,2  *  red  factor  i 

tnc  =  base_rcd  L  blk  distri  tableir.3  *  db  factor  db 
for  each  value  y,  y  ranging  from  I  to  tnc  do 
first_value=w-i-  (x*y)  -  (x  -  1) 
second  value  =  w  ( x * y  ) 
write  first_value,  second  value 

end  for 
w  =  w  —  x*y 
end  for 

write  the  character  © 

end  for 

write  the  character  I 

send  file  to  directory  placing  routine 
end  for  ( dbas e _f ac t or  loop) 
end  procedure 
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procedure  Write _record_files(  ^nput^_  num_be  .  key  numrcds  . 

output:  record  files): 

*  The  following  are  global  data  elements: 

*  num  attr  per_rcd_sz  is  a  global  array  containing 

the  number  of  attributes  per  record  class 
/*  key_num_rcds  is  a  global  variable:  the  number  of  records  from 
/*  the  large  record  class  from  the  small  database  set. 

/*  num  be  is  the  number  of  backends 

/*  t_name  is  a  global  array  of  the  template  names 

/*  The  following  are  local  data  elements: 

/*  maxs t r i ngval ue  is  the  max  number  of  string  attributes  for  a  given 
!  *  record  size 

/*  strnum  is  a  loop  index  meaning  number  of  string  attributes 

for  each  dbase  factor  db  ,  db  ranging  from  1  t o  3  do 

for  each  mu  ltiplei.  i  ranging  f  rom  1  to  num  be  do 

write  to  a  file 

database  id  name  =  TEST 
the  character  © 

for  t_name  j;  j  ranging  from  1  to  4  do 

factor  =  rcdfactorjj  *  db_f ac tor i db j 
for  value  =  1  to  key  numrcds  *  factor  do 

write  t  _n  ame  j  j i 

write  value,  value,  mu  ltiplei:. 

max  string_value  =  num  stir  perrcd  sz  j 
for  strnum  =  1  to  max  siring  value  do 


write  Xxxxxxxxx 


end  for 


end  for 


write  the  character  © 

end  for 

if  mu  ltiple  i  >1  then 

gei  previous  file  written:  append  current  file 
to  previous  file 

send  file  to  directory  routine 


end  procedure 


perform  Generate_trans_mixes (  input  :  num_be. 

output:  12  transaction  mix  files 


Twelve  transaction  mix  files  are  generated,  one  per  record  c 
for  each  of  the  three  databases 


lass  V 


Twelve  transaction  mix  report  files  are  generated,  one  per 
record  class  for  each  of  the  three  databases 


/*  Twelve  transaction  mix  workload  files  are  generated,  one  per 
!*  record  class  for  each  of  the  three  databases 


For  each  of  the  three  databases,  db  do 


Perform  Gen_db_trans  _mi xe  s ( ) 


end  proc  edure ; 


Procedure  Gen  db  trans  mixes  () 


for  each  of  the  four  record  classes.  rs  do 


open  a  file  for  the  transaction  mixes; 


open  a  file  for  the  transaction  mixes  report: 


open  a  file  for  the  transaction  mixes  workload: 


factor  =  dbase  factor  db!  *  red  factorlrs;; 


perform  Genreqsetlf) 
perform  Gen_reqset2( ) 
perform  Gen_reqs  e  t  3  (  ) 
perform  Gen_reqsei4(  ) 
perform  Gen_reqset5( ) 
perform  Gen_reqset6( ) 


close  all  files 


end  for  1 oop 


end  proc  edure 
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Procedure  Gen_reqsetl() 

print  table  headings  for  the  report  file  and  workload  file 


perform  Reque  s  1 1 ( ) : 
perform  Request 2(  )  : 
perform  Request3(  )  ; 

end  procedure  ; 
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Request  1() 

OBJECTIVE:  Retrieve 

Assign  the  appropriate  template  and  descriptor  names. 

Given  the  number  of  backends  determine  the  minimum  number  of 
records  to  be  retrieved  such  that  each  backend  provides  the 
same  number  of  records.  The  target  cluster  category  is  1. 

The  records  to  be  retrieved  should  be  taken  from  the  middle 
of  the  cluster  category. 

The  values  selected  are  for  the  INTONExx  descriptor. 

For  the  work  1 oad : 

Calculate  the  number  of  records  accessed  by  counting  all  the 
records  in  the  first  category. 

Calculate  the  number  of  records  retrieved  by  multiplying  the 
number  of  backends  by  the  number  of  records  per  cluster  for 
cluster  category  1.  Remember  only  1  cluster  per  backend  is 
retrieved. 

end  procedure; 
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Reqnest2() 

OBJECTIVE:  Retrieve 
Request?  has  been  stubbed 
end  procedure: 

Request3  () 

OBJECTIVE:  Retrieve 


_ : 


Assign  the  appropriate  template  and  descriptor  names. 

Determine  2 5%  of  the  database,  and  calculate  the  I N’liVVDx x  descriptor 
value. 
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Request4() 

OBJECTIVE:  Update 

Assign  the  appropriate  template  and  descriptor  names. 

Calculate  1/8  of  the  given  database  and  assign  that  value  to 
the  INTOOxx  descriptor  attribute. 

For  the  workload: 

Determine  1/8  of  the  database.  Determine  which  cluster 
category  earmarks  1/8  of  the  database  and  calculate 
the  volume  of  the  database  accessed  by  counting  all  the  records 
in  that  cluster  plus  all  preceding  clusters,  then  divide  that 
sum  by  the  total  number  of  records  in  all  categories. 

The  volume  of  the  database  updated  is  12.50%. 

end  procedure: 


Requests  () 

OBJECTIVE:  Update 

Assign  the  appropriate  template  and  descriptor  names. 

Calculate  1/4  of  the  given  database  and  assign  that  value  to 
the  INIWJxx  descriptor  attribute. 
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For  the  work  1 oad : 

Determine  1/8  of  the  database.  Determine  which  cluster 
category  earmarks  1/4  of  the  database  and  calculate 
the  volume  of  the  database  accessed  by  counting  all  the  records 
in  that  cluster  plus  all  preceding  clusters,  then  divide  that 
sum  by  the  total  number  of  records  in  ail  categories. 

The  volume  of  the  database  updated  is  25.00%. 

end  procedure; 
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Requests  () 


OBJECTIVE :  Update 

Assign  the  appropriate  template  and  descriptor  names. 

Calculate  1/2  of  the  given  database  and  assign  that  value  to 
the  INIWDxx  descriptor  attribute. 

For  the  workload: 

Determine  1/2  of  the  database.  Determine  which  cluster 
category  earmarks  1/2  of  the  database  and  calculate 
the  volume  of  the  database  accessed  by  counting  all  the  records 
in  that  cluster  plus  all  following  clusters,  then  divide  that 
sum  by  the  total  number  of  records  in  all  categories. 

The  volume  of  the  database  updated  is  50.00%. 


end  procedure; 


Procedure  Gen_reqset3() 

print  table  headings  for  the  report  file  and  workload  file 

per f orm  Reque s t 7 ( ) : 
perform  Reque  s  1 8 ( ) ; 
perform  Request9(); 
perform  RequestlO(): 
perform  Requestll(): 

end  procedure; 

RequestT() 

OBJECTIVE:  Retrieve 

Half  of  the  database  is  to  be  accessed  and  .1/8  is  to  be  retrieved. 

Assign  the  appropriate  template  and  descriptor  names. 

Calculate  1/2  of  the  given  database  and  assign  that  value  to 
the  IOTONExx  descriptor  attribute. 

Assign  STRING001  attribute  the  value  Oneeighth. 

For  the  workload: 

Determine  1/2  of  the  database.  Determine  which  cluster 
category  earmarks  1/2  of  the  database  and  calculate 
the  volume  of  the  database  accessed  by  counting  all  the  records 
in  that  category  plus  all  preceding  categories,  then  divide  that 
sum  by  the  total  number  of  records  in  all  categories. 

The  volume  of  the  database  retrieved  is  12.50%. 

end  procedure ; 

Requests  () 

OBJECTIVE;  Retrieve 

All  of  the  database  is  to  be  accessed  and  1'8  is  to  be  retrieved 
Assign  the  appropriate  template  name. 

Assign  STRING001  attribute  the  value  Oneeighth. 

For  the  workload: 

The  volume  of  the  database  accessed  is  100. 00%. 

The  volume  of  the  database  retrieved  is  12.50%. 


end  procedure : 


Request9() 


OBJECTIVE:  Retrieve 

All  of  the  database  is  to  be  accessed  and  1  'A  is  to  be  retrieved. 

Assign  the  appropriate  template  name. 

Assign  STRING005  attribute  the  value  Onequartr. 

For  the  workload: 

The  volume  of  the  database  accessed  is  100.00%. 

The  volume  of  the  database  retrieved  is  25.00%. 

end  procedure: 

Request  10  () 

OBJECTIVE:  Retrieve 

All  of  the  database  is  to  be  accessed  and  1/2  is  to  be  retrieved. 

Assign  the  appropriate  template  name. 

Assign  STRING010  attribute  the  value  Onehalf. 

For  the  workload: 

The  volume  of  the  database  accessed  is  100.00%. 

The  volume  of  the  database  reprieved  is  50.00% 

end  procedure; 

Requestll() 

OBJECTIVE:  Retrieve 

Half  of  the  database  is  to  be  accessed  and  1/2  is  to  be  retrieved. 

Assign  the  appropriate  template  and  descriptor  names. 

Calculate  1/2  of  the  given  database  and  assign  that  value  plus  1  to 
the  INl'lVNOxx  descriptor  attribute 

Assign  STRING010  attribute  the  value  Onehalf. 

For  the  work  1 oad : 

Determine  12  of  the  database.  Determine  which  cluster 
category  earmarks  the  IXIWJXX  value  of  the  database  and  calculate 
the  volume  of  the  database  accessed  by  counting  all  the  records 
in  that  category  plus  all  following  categories,  then  divide  that 
sum  by  the  total  number  of  records  in  all  categories. 

The  volume  of  the  database  retrieved  is  50.00%. 


end  proc  edure  : 


Procedure  Gen_reqset4() 


print  table  headings  for  the  report  file  and  workload  file 

pe r f orm  Reque s t 1 2  (  )  : 
perform  Requestl3(  )  ; 
perform  Requestl4( )  : 

Note  that  the  workload  for  Request4  involves  the  following: 
Number  of  clusters  examined  by  the  first  (source)  retrieve 
Number  of  records  accessed  by  the  source  retrieve 

Number  of  records  relevant  to  the  source  retrieve 

Number  of  clusters  examined  by  the  second  (target)  retrieve 
Number  of  records  accessed  by  the  target  retrieve 

Number  of  records  relevant  to  the  target  retrieve 

Size  of  the  resulting  record  set  (in  number  of  records) 

end  procedure; 


Requestl2() 

OBJECTIVE:  Re t r i eve -Common 

To  access  a  small  selection  of  records  in  the  first  cluster 
category  of  back  to  back  record  classes  and  return  those 
records  that  share  counon  INTTONExx  attributes  within  each 
respective  record  class. 

Assign  the  appropriate  template  and  descriptor  names 

The  descriptor  values  from  Requestl  are  used  for  the  first  retrieve 
and  the  upper  range  value  is  multiplied  by  the  record  factor 
as  the  value  for  the  second  retrieve. 

For  the  workload. 

The  workload  for  the  first  retrieve  is  the  same  as  Requestl. 

The  workload  for  the  second  retrieve  is  calculated  by  determining 
the  number  of  clusters  identified  in  cluster  category  1  and 
determining  the  appropriate  number  of  records  accessed  in  that 
category . 

The  result  record  set  is  the  same  as  the  number  of  records  relevant 
by  the  source  retrieve. 

end  proc  edure . 
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Request  13  () 


OBJECTIVE:  Re t r i eve-Conmon 

For  each  r e t r i eve  .  .  ac c e s s  all  the  records  from  the  record  class 
such  that  only  1/2  the  records  are  relevant  to  that  retrieve 

Retrieve  those  records  that  share  corrmon  INTTONExx  attributes 
within  each  respective  record  class. 

Assign  the  appropriate  template  values. 

Assign  the  value  "Onehalf"  to  the  STRING010  attribute. 

For  the  workload: 

All  cluster  categories  are  examined  by  both  the  source  and 
target  retrieve.  Make  the  appropriate  assignments. 

All  records  are  accessed  by  both  the  source  and 
target  retrieve.  Make  the  appropriate  assi gnmen t  s . 

The  number  of  records  relevant  is  half  of  the  total  number  of 
records  for  each  given  record  class. 

The  result  record  set  is  the  same  as  the  number  of  records  relevant 
by  the  source  retrieve. 


end  procedure . 


Request  14  () 


OBJECTIVE:  Re  t  r  i  eve -Cornnon 


For  the  source  retrieve,  access  all  the  records  from  the  record 
class  such  that  only  1/2  the  records  are  relevant  to  that  retrieve. 

For  the  target  retrieve,  access  1/16  the  records  of  the  second 
record  class. 

Retrieve  those  records  that  share  corrmon  INTTONExx  attributes 
within  each  respective  record  class. 

Assign  the  appropriate  template  and  descriptor  navies. 

Determine  half  the  number  of  records  for  the  first  retrieve  and 
assign  this  value  to  the  INTONExx  descriptor  attribute  for 
the  first  retrieve  and  also  as  the  upper  range  value  IKTONExx 
descriptor  in  the  second  retrieve. 

Determine  1/8  of  the  number  of  records  for  the  first  retrieve  and 
subtract  this  value  from  the  (half  the  number  of  records)  value. 
This  new  value  is  the  lower  range  value  for  the  IOTONExx 
descriptor  attribute  of  the  target  retrieve. 

For  the  workload: 

For  the  first  retrieve: 

Identify  the  target  cluster  category  and  sum  up  all  the  clus¬ 
ters  in  this  and  all  preceding  categories  for  the  total  number 
of  clusters  examined.  Do  the  same  for  all  the  records  accessed. 

The  number  of  records  relevant  to  the  source  is  exactly  1/2  the 
total  number  of  records. 

For  the  second  retrieve: 

Identify  the  target  cluster  category.  The  total  numbe r  of 
clusters  in  this  category  is  the  number  of  clusters  examined 
Do  the  same  for  the  number  of  records  accessed  by  the  target 
request.  The  number  of  records  relevant  to  the  target  request 
is  the  difference  between  its  II^fTONExx  descriptor  values. 

The  result  record  set  is  the  same  as  the  number  of  records 
relevant  by  the  target  retrieve. 


end  procedure  ; 
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Procedure  Gen_reqseto() 

print  table  headings  for  the  report  file  and  workload  file 

perform  Request 15( ) : 
perform  Requestl6j) : 

end  procedure ; 


Request  15  () 

OBJECTIVE:  Insert  (a  record  into  an  existing  cluster) 

Assign  the  appropriate  template  and  descriptor  names. 

Determine  the  number  of  backends  in  the  system:  the  value  for 
the  multiple  attribute  is  one  greater  than  the  number  of  backends 
Access  the  number  of  attributes  per  record  class  and  write  the 
appropriate  number  of  string  values,  Xxxxxxxxx. 

For  the  work  1 oad : 

No  clusters  are  examined  and  the  database  is  not  accessed. 

One  record  is  inserted. 

end  procedure : 


Request  16  () 

OBJECTIVE:  Insert  (a  record  into  a  new  cluster) 

Assign  the  appropriate  template  and  descriptor  names. 

To  determine  the  new  value  for  the  descriptor  attribute: 

Determine  the  number  of  records  in  the  first  cluster  category. 
Multiply  this  value  by  100.  then  divide  by  the  number  of  clusters 
in  the  first  cluster  category.  The  result  is  the  new  value  for 
the  descriptor  attribute. 

The  multiple  attribute’s  value  is  "One" 

Access  the  number  of  attributes  per  record  class  and  write  the 
appropriate  number  of  siring  values.  Xxxxxxxxx. 

For  the  work  load: 

No  clusters  are  examined  and  the  database  is  not  accessed. 

One  record  is  inserted. 

end  proc  edure ; 
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The  descriptor  attribute  values  are  the  same  as  Request  1 
For  the  workload: 

The  workload  is  the  same  as  request  1. 
end  procedure  : 


Reqnestl8() 

OBJECTIVE:  Delete 
Request  18  has  been  stubbed 
end  procedure: 


Request  J  9() 

OBJECTIVE:  Delete 

Assign  the  appropriate  template  and  descriptor  names 

The  descriptor  attribute  value  is  calculated  as  follows: 
Subtract  2 h%  of  the  total  records  fr om  the  total  records  value 
The  result  is  the  descriptor  value. 
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For  the  workload: 

To  determine  the  volume  of  the  database  deleted: 

Sum  all  the  records  between  the  descriptor  attribute  value 


value 

'o  determine  the  volume  of  the  database  accessed: 

Identify  the  cluster  that  stores  the  target  descriptor  value. 

Sum  all  the  records  in  this  cluster  plus  all  the  records  in  those 
clusters  that  follow  up  through  the  last  cluster  category.  Divide 
the  sum  by  the  total  number  of  records. 

idure ; 
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JBJECTIVE.  Delete 

Assign  the  appropriate  template  and  descriptor  names. 

rhe  descriptor  attribute  values  are  the  same  as  Request  7. 
rhe  string  assignment  is  the  same  as  Request  7. 

ror  the  workload: 

The  workload  is  the  same  as  request  7. 
idure  . 


0 

»JECTIVE:  Delete 

Assign  the  appropriate  template  name. 

rhe  string  assignment  is  the  same  as  Request  8. 

'or  the  workload: 

The  workload  is  the  same  as  request  8 
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»JECTIVE:  Delete 

Assign  the  appropriate  template  name. 

fhe  string  assig  nme  n  t  is  the  s  ame  as  Request  9 


or  the  wor k 1 oad : 

The  workload  is  the  same  as  request  9. 


Request  23  () 


OBJECTIVE:  Delete 

Assign  the  appropriate  template  name. 

The  string  assig  nme  n  t  is  the  s  ame  as  Request  10 
For  the  workload: 

The  workload  is  the  same  as  request  10. 
end  procedure. 


Request24() 

OBJECTIVE:  Delete 

Assign  the  appropriate  template  and  descriptor  names 

The  descriptor  attribute  value  is  the  same  as  Request  11 
The  string  assig  nme  n  t  is  the  s  ame  as  Request  11. 

For  the  workload: 

The  workload  is  the  same  as  request  11. 


end  procedure : 


procedure  Format  test  benchmark _dbs  (  num  be  .  numconfig. 

output:  Ig  db  set.  med  _d  b  set 
sma  1 1 _db  set): 

*  This  procedure  generates  the  t e s t _benc hmarkdat abas e s 
/*  for  the  report 

for  each  database  size, 

write  a  separate  database  set  file  (large,  med.  small)  such  that 
dbsziij,  i  ranging  fromO  to  2  do 

/*  The  following  code  will  generate  the  response  time  * 

/*  reduction  configurations  *, 

for  b  ends  in  use  =  1  to  numbe  do 
config  num  =  num  be 

write  config_num 

mby t e s _pe r  be  =  (db  sz  i  ,4)  numbe 

*  dividing  the  db  size  i  by  4  distributes  * 

*  the  database  evenly  to  each  of  the  record* 

‘classes  * 

write  bendsin  use 

for  each  rcdsz  p  .  p  ranging  from  0  to  3  do 
num  red  per  be  =mbytes  per  be  rcdsz 

write  red  sz  p  . 

write  num  redperbe.  mbytes  per  be 

end  do 

write  dbsz  i 

end  do 


The  f o i  i ow ing  rode  will  general"  the  response  l  t me 
invariance  configurations 

confignum  =  config_num  -  1: 
b  ends  _i n  use  =  2: 

while  bends _i nuse  is  less  than  or  equal  to  num  be  do 
write  confignum.  b_ends_in  use 
mby tes  per  be  =  db  sz  ii/4 

db_sz_in_use  =  (mby  t  es  _per_be  )  *  (bends  jn  use) 

for  each  rcdszp  p  ranging  from  0  to  3  do 

numrcdper  be  =mbytes  per  be  red  sz  p 

write  red  szipi,  num  red  per  be, 
write  mbytesper  be 

end  do 

write  db_s  z  in  use 
b  end  s  in  use  =  b  ends  inuse  -  1 
con fig_num  =  con  fig  num  -  1 
end  while 

end  for ; 
end  procedure : 

'*  bend s _ i n _u s e  is  the  number  of  backends  used  for  a  specific  configuration 

*  config  num  is  a  specific  configuration  (number). 

*  redsz  is  one  of  four  record  sizes  from  four  red  sz( 

*  num  redper  be  is  the  number  of  records  per  backend  (per  record  class) 
dbjz  in  use  is  the  database  size  created  for  the  invariance  configur- 

*  a  t  i  o  n  s 

*  mbytes  per  be  is  Mbytes  per  backend  (per  record  class) 
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APPENDIX  B:  CAD  GENERATED  REPORTS 

The  reports  contained  in  this  appendix  comprise  only  those  generated  for 
the  small  database  set.  The  CAD  system  generates  similar  reports  for  the  large 
and  medium  database  sets  as  well.  The  reports  are  based  upon  a  system  defined 
to  have  the  following: 


maximum  disk  storage: 

375 

Megabytes 

available  disk  storage: 

300 

Megabytes 

disk  track  size: 

4000 

bytes 

number  of  backends: 

3 

The  first  set  of  reports  are  generated  by  the  Test-Database  Component.  The 
second  set  of  reports  are  generated  by  the  Test-Transaction  Mix  Component. 


TEST-DATABASE  COMPONENT  REPORT:* 


THE  RECORDS-PER-BLOCK  RELATIONSHIP 


THE  NUMBER  OF  RECORDS  PER  CLUSTER  CATEGORY  TABLE 


Blocks 

per 

Cluster: 

2000 

1000 

2 

4 

8 

i 

20 

40 

3 

6 

12 

30 

60 

4 

8 

16 

40 

80 

5 

10 

20 

50 

100 

6 

12 

24 

60 

120 

7 

14 

28 

70 

140 

8 

16 

32 

80 

160 

9 

18 

36 

90 

180 

10 

20 

40 

100 

200 

V.  t 
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SMALL  DB  TEST  CONFIGURATION > 


Configuration 

Number 

Record 

Number  of 

Mybytes 

Database 

Number 

of 

Size  in 

Records  per 

per 

Size  in 

Backends 

Bytes 

Backend 

Backend 

Mbytes 

1 

1 

2000 

9372 

18744000 

1000 

18744 

18744000 

400 

46860 

18744000 

200 

93720 

18744000 

74.976 

2 

2 

2000 

4686 

9372000 

1000 

9372 

9372000 

400 

23430 

9372000 

200 

46860 

9372000 

74.976 

3 

9 

O 

3124 

6248000 

6248 

6248000 

WEm 

15620 

6248000 

KB 

31240 

6248000 

74.976 

4 

2 

2000 

9372 

18744000 

1000 

18744 

18744000 

400 

46860 

18744000 

200 

93720 

18744000 

149.952 

5 

9 

«> 

2000 

9372 

18744000 

1000 

18744 

18744000 

400 

46860 

18744000 

_ 

200 

93720 

18744000 

224.928 
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RECORD  BLOCK  DISTRIBUTION  TABLE 
SMALL  DATABASE 
THE  NUMBER  OF  BACKENDS:  3 


Record 

Number  of 

Number  of 

Total 

Total 

Total  | 

Site 

Blocki 

Recordi 

Number 

Number 

Number 

in 

per 

per 

of 

of 

of 

Bytet 

Clutter 

Clutter 

Clutters 

Records 

Biockt 

2000 

2 

4 

86 

344 

172 

3 

6 

87 

522 

261 

4 

8 

87 

696 

348 

5 

10 

87 

870 

435 

6 

12 

87 

1044 

522 

7 

14 

87 

1218 

609 

6 

16 

87 

1392 

696 

9 

18 

87 

1566 

783 

10 

20 

86 

1720 

860 

1000 

2 

8 

86 

688 

172 

* 

12 

87 

1044  ' 

261 

4 

16 

87 

1392 

348 

5 

20 

87 

1740 

435 

6 

24 

87 

2088 

522 

7 

28 

87 

2436 

609 

8 

32 

87 

2784 

696 

9 

36 

87 

3132 

783 

10 

40 

86 

3440 

860 

400 

2 

20 

86 

1720 

172 

3 

30 

87 

2610 

261 

4 

40 

87 

3480 

348 

5 

50 

87 

4350 

435 

6 

60 

87 

5220 

522 

7 

70 

87 

6090 

609 

8 

80 

87 

6960 

696 

9 

90 

87 

7830 

783 

10 

100 

86 

8600 

860 

200 

2 

40 

86 

3440 

172 

9 

*4 

60 

87 

5220 

261 

4 

80 

87 

6960 

348 

5 

100 

87 

8700 

435 

6 

120 

87 

10440 

522 

7 

140 

87 

12180 

609 

8 

160 

87 

13920 

696 

9 

180 

87 

15660 

783 

10 

200 

86 

17200 

860 

y.y.y 
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TEST-TRANSACTION  MIX  COMPONENT  REPORTS 


SMALL  DATABASE 
LARGE  RECORD  CLASS 
RECORD  SIZE:  2000 


REQUEST  SET  1 


Request 

Number: 

RETRIEVAL  Request 

Queries: 

i 

((TEMP=T«mplg)and(INTONELG>  =  109)and(INTONELG<  =  120)) 

9 

a* 

((TEMP=TempIg)and(INTTWOLG<=2343)) 

REQUEST  SET  2 


Request 

Number: 

UPDATE  Request 

Queries: 

4 

((TEMP=T«mplg)and(INTTWOLG<  =  U72))(STROOOOO)  =  Oneeighth) 

5 

((TEMP=Templg)and(INTTWOLG<=2343))(STR000005  «  Onequartr) 

6 

((TEMP=Tempig)and(INTTWOLG  >  4686))(STR000010  =  Onehalfl 

Request 

RETRIEVAL  Reques: 

Number: 

Queries: 

7 

((TEMP=Templg)and(INTTWOLG  <=  4686)and(STR000001  =  Oneeighth)) 

6 

( ( TEMP=T«mplg  )and(  STR000001  =  Oneeighth)) 

9 

((TEMP=TempIg)and(STR000005  =■  Onequartr)) 

10 

((TEMP-Templg)and(STROOOOlO  =  Onehalf)) 

11 

((TEMP=Templg)and(INTTWOLG  >=  4687)and(STR000010  =  Onehalf)) 

REQUEST  SET  4 


Request 

Number: 

RETRIEVE-COMMON 

Request  Specifications: 

12 

RE  TRIE  VE(  ( TEMP = Templg  )and(  IN  TONELG  >  =  1 09 ) 
and(INTONELG<  =  120) 

COMMON(INTONELGJNTONEMEDLG) 

RE  TRIE  VE((  TEMP =Tempmedfg )and(INTONEMEDL  G  <  =  240 ) ) 
(INTONEMEDLG) 

13 

RE  TRIE  VE(  ( TEMP=Tempig)and(STR  000010=  Onehalf) 
(INTTWOLG) 

COMMON(INTONELGJNTONEMEDLG) 

RE  TRIE  VE((  TEMP  =  Tempmedlg)and(STR000010=Onehalf) 
(INTTW  OMEDLG ) 

14 

RE  TRIE  VE(!  TEMP  =  Templg)and(INTONELG<  =4686)) 
(INTONELG) 

COMMON)  INTTWOLG. INTTW  OMEDLG ) 

RETRIEVE!  ( TEMP  =  Tempmedlg  )and(INTONEMEDLG  >  =  3515) 
and(INTONEMEDLG<  =  4686)  ((INTONEMEDLG) 

Reouest 

INSERT  Request  ) 

Number: 

Queries: 

IS 

(<TEMPLATE.Templg>.<INTONELG.l>.<INTTWOLG,l>,<MuUiple,Four>, 

<STROOOOOl,Xxxxxxxxx> . <STR000196.Xxxxxxxxx>) 

16 

(<  TEMPLATE, Templg>,<INTONELG.l>.<INTTWOLG,400>.<Multiple,One>, 

<STR000001,Xxxxxxxxx> . <STR000196.Xxxxxxxxx>) 

REQUEST  SET  6 


Request 

DELETE  Request 

Number: 

Queries: 

17 

((TEMP=Templg)and(INTONELG>  =  109)and(lNTONELG<  =  120)) 

19 

((TEMPLATE  =  Templg)and(INTTWOLG>  =  7030)) 

20 

((TEMP- Templg)and(INTTWOLG  <=  4686)and(STR000001  =  Oneeighth)) 

m 

((TEMP=Templg)and(STR000001  =  Oneeighth)) 

q 

((TEMP=Templg)and(STR000005  =  Onequartr)) 

23 

((TEMP=Templg)and(STR000010  =  Onehalf)) 

24 

((TEMP=Templg)and(lNTTWOLG  <=  4687)and(STR000010  =  Onehalf)) 

tTi' 
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REQUEST  SET  f.  W  ORKLOAD 
DELETE  QUERIES 


Volume  of 

Volume  of 

Accessed 

Deleted 

344  records 

12  records 

25.07  % 

24.99  % 

50.0854  % 

12.5000  % 

100.00  % 

12.51  % 

100.00  % 

25.00  % 

100.00  % 

50.00  % 

50.0640  % 

50.0107  % 
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